Abstract
Clustered data-based analysis has been extensively conducted in vari
ous studies. Recent research has demonstrated that a network-based heterogeneity analysis, which adopts a system perspective and incorporates the intercon-
nections among variables while considering heterogeneity between components,
can provide more informative results compared to approaches based on simpler
statistics.
Moreover, incorporating grouping strategies in analysis can better
delineate the sources of heterogeneity and enable more flexible modeling for clustered data. In this article, we introduce a novel approach called the grouped
heterogeneous Gaussian graphical models (Grouped-HGGM) for network analysis of high-dimensional clustered data. Our approach assumes that clusters can
be divided into distinct groups, and any heterogeneity across clusters is captured
through the cluster-wise mixture probabilities. Unlike most previous approaches
that assume that the number of components is known in advance, an appealing
feature of our method is the automatic determination of the number of components and sparse estimation using a fusion technique. Consistency properties
are rigorously established, and an effective computational algorithm is developed. Extensive simulations demonstrate the practical superiority of the proposed
approach over closely related alternatives. In the analysis of breast cancer data, the proposed approach identifies heterogeneity structures different from the
alternatives.
Information
| Preprint No. | SS-2024-0258 |
|---|---|
| Manuscript ID | SS-2024-0258 |
| Complete Authors | Xin Zeng, Shuangge Ma, Qingzhao Zhang |
| Corresponding Authors | Qingzhao Zhang |
| Emails | zhangqingzhao@amss.ac.cn |
References
- Cai, T. T., Liu, W. and Zhou, H. H. (2016). Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation, Ann. Statist. 44, 455–488.
- Chen, X., Feng, Z. and Peng, H. (2023). Estimation and order selection for multivariate exponential power mixture models. J. Multivariate Anal. 195, 105140.
- Danaher, P., Wang, P., and Witten, D. M. (2014). The joint graphical lasso for inverse covariance estimation across multiple classes. J. Roy. Statist. Soc. B 76, 373–397.
- Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96, 1348–1360.
- Fokkema, M., Smits, N., Zeileis, A., Hothorn, T. and Kelderman, H. (2018). Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees. Behav. Res. Methods 50, 2016–2034.
- Galbraith, S., Daniel, J. A. and Vissel, B. (2010). A study of clustered data and approaches to its analysis. J. Neurosci. 30, 10601–10608.
- Gao, C., Zhu, Y., Shen, X. and Pan, W. (2016). Estimation of multiple networks in gaussian mixture models. Electron. J. Stat. 10, 1133–1154.
- G¨obler, K., Drton, M., Mukherjee, S., & Miloschewski, A. (2024). High-dimensional undirected graphical models for arbitrary mixed data. Electron. J. Stat. 18, 2339–2404.
- Guo, J., Levina, E., Michailidis, G. and Zhu, J. (2011). Joint estimation of multiple graphical models. Biometrika 98, 1–15.
- Hao, B., Sun, W. W., Liu, Y. and Cheng, G. (2018). Simultaneous clustering and estimation of heterogeneous graphical models. J. Mach. Learn. Res. 18, 7981–8038. KEGG. (Kyoto Encyclopedia of Genes and Genomes). https://www.genome.jp/pathway/hsa05224. Accessed on 7/16/2023.
- Li, Y., Xu, S., Ma, S. and Wu, M. (2022). Network-based cancer heterogeneity analysis incorporating multi-view of prior information. Bioinformatics 38, 2855–2862.
- McLachlan, G. J. and Peel, D. (2000). Finite mixture models, New York: Wiley.
- Pei, Y., Peng, H. and Xu, J. (2022). A latent class Cox model for heterogeneous time-to-event data. J. Econometrics 239, 105351.
- Pereda-Fernandez, S. (2021). Copula-based random effects models for clustered data. J. Bus. Econom. Statist. 39, 575–588.
- Ren, M., Zhang, S., Zhang, Q. and Ma, S. (2022). Gaussian graphical modelbased heterogeneity analysis via penalized fusion. Biometrics 78, 524–535.
- Rodriguez, A., Dunson, D. B. and Gelfand, A. E. (2008). The nested dirichlet process. J. Amer. Statist. Assoc. 103, 1131–1154.
- Sugasawa, S. (2021). Grouped heterogeneous mixture modeling for clustered data. J. Amer. Statist. Assoc. 116, 999–1010.
- Sung, H., Ferlay, J., Siegel, R. L., Laversanne, M., Soerjomataram, I., Jemal, A., et al. (2021). Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71, 209–249. TCGA. (The Cancer Genome Atlas). https://portal.gdc.cancer.gov/projects/TCGA-BRCA. Accessed on 7/16/2023.
- Teh, Y. W., Jordan, M. I., Beal, M. J. and Blei, D. M. (2006). Hierarchical Dirichlet processes. J. Amer. Statist. Assoc. 101, 1566–1581.
- Wang, B., Zhang, Y., Sun, W. W. and Fang, Y. (2018). Sparse convex clustering. J. Comput. Graph. Statist. 27, 393–403.
- Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38, 894–942. Xin Zeng, Department of Statistics and Data Science, School of Economics, Xiamen University,
- Xiamen, China
Acknowledgments
We thank the Editor, Associate Editor, and two reviewers for their careful review and insightful comments. This study is supported by the Hu-
manities and Social Science Foundation of Ministry of Education of China
24YJA910007, NIH CA204120, and NSF 2209685.
Supplementary Materials
Contain the additional computational, theoretical and numerical results in
the online supplementary materials.