Abstract

The Gaussian graphical model is routinely employed to model the joint distribution of multiple random variables. The graph it induces is not only useful for

describing the relationship between these variables but also critical for improving statistical estimation precision. In high-dimensional data analysis, despite

abundant literature on estimating this graph structure, tests for the adequacy of

its specification at a global level are severely underdeveloped. To make progress,

this paper proposes novel goodness-of-fit tests that are computationally easy and

theoretically tractable. The first contribution of this paper is the development of

a new direct plug-in test statistic. We show that its asymptotic distribution under

the null follows a Gumbel distribution with a location parameter depending on

the underlying true graph structure. The direct test, however, has no power for

detecting structures including the truth but not equal. Our second contribution is

the development of a novel consistency-empowered test statistic that gains power

by, interestingly, amplifying the noise incurred in estimation. The improved test is

shown to be universally consistent for all fixed alternatives. Extensive simulation

illustrates that the proposed test procedures have the right size under the null,

and is powerful under alternatives. As an application, we apply the tests to the

analysis of a COVID-19 data set, demonstrating that our test can serve as a

valuable tool in choosing a graph structure to improve estimation efficiency.

Information

Preprint No.SS-2023-0385
Manuscript IDSS-2023-0385
Complete AuthorsThien-Minh Le, Ping-Shou Zhong, Chenlei Leng
Corresponding AuthorsPing-Shou Zhong
Emailspszhong@uic.edu

References

  1. Bai, Z. and H. Saranadasa (1996). Effect of high dimension: By an example of a two sample problem. Statistica Sinica 6(2), 311–329.
  2. Cai, T., W. Liu, and X. Luo (2011). A constrained l1 minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association 106(494), 594–607.
  3. Chen, S. X., L.-X. Zhang, and P.-S. Zhong (2010). Tests for highdimensional covariance matrices. Journal of the American Statistical Association 105(490), 810–819.
  4. Cheng, G., Z. Zhang, and B. Zhang (2017). Test for bandedness of highdimensional precision matrices. Journal of Nonparametric Statistics 29(4), 884–902.
  5. Drton, M. and M. D. Perlman (2004). Model selection for gaussian concentration graphs. Biometrika 91(3), 591–602.
  6. Edwards, D. (2000). Introduction to Graphical Modelling. New York: Springer.
  7. Eftekhari, A., D. Pasadakis, M. Bollhöfer, S. Scheidegger, and O. Schenk
  8. (2021). Block-enhanced precision matrix estimation for large-scale datasets. Journal of Computational Science 53, 2975–3026.
  9. Fan, J. (1996). Test of significance based on wavelet thresholding and neyman’s truncation. Journal of the American Statistical Association 91(434), 674–688.
  10. Fan, J., Y. Liao, and J. Yao (2015). Power enhancement in high-dimensional cross-sectional tests. Econometrica 83(4), 1497–1541.
  11. Friedman, J., T. Hastie, and R. Tibshirani (2007). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441.
  12. Friedman, J., T. Hastie, and R. Tibshirani (2019). glasso: Graphical Lasso: Estimation of Gaussian Graphical Models. R package version 1.11.
  13. Goeman, J. J. and U. Mansmann (2008, 01). Multiple testing on the directed acyclic graph of gene ontology. Bioinformatics 24(4), 537–544.
  14. Guo, X. and C. Y. Tang (2020). Specification tests for covariance structures in high-dimensional statistical models. Biometrika 108(2), 335–351.
  15. Janková, J. and S. van de Geer (2017). Honest confidence regions and optimality in high-dimensional precision matrix estimation. Test 26(26), 143–162.
  16. Lauritzen, S. L. (1996). Graphical Models. Oxford: Clarendon Press.
  17. Le, T.-M. and P.-S. Zhong (2022). High-dimensional precision matrix estimation with a known graphical structure. Stat 11(1), e424.
  18. Li, C. and H. Li (2008). Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24(9), 1175–1182.
  19. Liu, H. and L. Wang (2017). TIGER: A tuning-insensitive approach for optimally estimating Gaussian graphical models. Electronic Journal of Statistics 11(1), 241–294.
  20. Liu, W. (2013). Gaussian graphical model estimation with false discovery rate control. The Annals of Statistics 41(6), 2948–2978.
  21. Liu, W. and X. Luo (2015). Fast and adaptive sparse precision matrix estimation in high dimensions. Journal of Multivariate Analysis 135, 153–162.
  22. Ning, Y. and H. Liu (2017). A general theory of hypothesis tests and confidence regions for sparse high dimensional models. The Annals of Statistics 45(1), 158–195.
  23. Qiu, Y. and S. X. Chen (2012). Test for bandedness of high-dimensional covariance matrices and bandwidth estimation. The Annals of Statistics 40(3), 1285–1314.
  24. Ren, Z., T. Sun, C.-H. Zhang, and H. H. Zhou (2015). Asymptotic normality and optimalities in estimation of large Gaussian graphical models. The Annals of Statistics 43(3), 991–1026.
  25. Sales, G., E. Calura, D. Cavalieri, and C. Romualdi (2012). graphite - a bioconductor package to convert pathway topology to gene network. BMC Bioinformatics 13(20).
  26. The New York Times (2021). Coronavirus (covid-19) data in the united states. https://github.com/nytimes/covid-19-data.
  27. Wang, X., G. Xu, and S. Zheng (2023). Adaptive tests for bandedness of high-dimensional covariance matrices. Statistica Sinica 33, 1673–1696.
  28. Xia, Y., T. Cai, and T. T. Cai (2015). Testing differential networks with applications to the detection of gene-gene interactions. Biometrika 102(2), 247–266.
  29. Yuan, M. and Y. Lin (2007). Model selection and estimation gaussian graphical model. Biometrika 94, 19–35.
  30. Zheng, S., Z. Chen, H. Cui, and R. Li (2019). Hypothesis testing on linear structures of high-dimensional covariance matrix. The Annals of Statistics 47(6), 3300–3334.
  31. Zhong, P.-S., W. Lan, P. X. K. Song, and C.-L. Tsai (2017). Tests for covariance structures with high-dimensional repeated measurements. The Annals of Statistics 45(3), 1185–1213.
  32. Zhou, S., P. Rütimann, M. Xu, and P. Bühlmann (2011). High-dimensional covariance estimation based on gaussian graphical models. Journal of Machine Learning Research 12, 2975–3026.
  33. Zhou, Y. and P. X.-K. Song (2016). Regression analysis of networked data. Biometrika 103(2), 287–301. 1 Department of Mathematics - University of Tennessee at Chattanooga,

Acknowledgments

The research was partially supported by NSF grants DMS-1462156 and FRG-

  1. The authors thank Editor Prof. Judy Wang, the Associate Editor,

and the two referees for their constructive comments, which significantly

improved the paper.

Supplementary Materials

All technical proofs, examples of structures satisfying condition (C1),

additional simulation results, a data-driven tuning parameter selection

procedure, and further details on the real data analysis are provided

in the supplementary materials.

The source codes and real data sets

are included in a Github link: https://github.com/leminhthien2011/

UniversalConsistentTestforGraphsofGGM.


Supplementary materials are available for download.