Abstract
Repeated measurements are common in many fields, where random vari
ables are observed repeatedly across different subjects. Such data have an underlying
hierarchical structure, and it is of interest to learn covariance/correlation at different levels. Most existing methods for sparse covariance/correlation matrix estimation
assume independent samples. Ignoring the underlying hierarchical structure and correlation within the subject may lead to erroneous scientific conclusions. In this paper,
we propose to distinguish between the between-subject covariance structure and the
within-subject covariance structure.
In the presence of repeated measurement, this
leads to the problem of sparse and positive-definite estimation of between-subject and
within-subject covariance matrices. Our estimators are solutions to convex optimization problems that can be solved efficiently. We establish estimation error rates for the
proposed estimators and demonstrate their favorable performance through theoretical
analysis and comprehensive simulation studies. We further apply our methods to construct between-subject and within-subject covariance graphs of clinical variables from
hemodialysis patients.
Information
| Preprint No. | SS-2024-0279 |
|---|---|
| Manuscript ID | SS-2024-0279 |
| Complete Authors | Sunpeng Duan, Guo Yu, Juntao Duan, Yuedong Wang |
| Corresponding Authors | Guo Yu |
| Emails | guoyu@ucsb.edu |
References
- 1562, 2012.
- J. Algina and H. Swaminathan. Psychometrics: Classical test theory. International Encyclopaedia of the Social and Behavioural Sciences, 19:423–430, 2015.
- H. Bae, S. Monti, M. Montano, M.H. Steinberg, T.T. Perls, and P. Sebastiani.
- Learning bayesian networks from correlated data. Scientific Reports, 6:25156, 2016.
- P.J. Bickel and E. Levina. Regularized estimation of large covariance matrices. Ann. Statist., 36:199–227, 2008a.
- P.J. Bickel and E. Levina.
- Covariance regularization by thresholding. Ann. Statist., 36:2577–2604, 2008b.
- J. Bien and R.J. Tibshirani.
- Sparse estimation of a covariance matrix. Biometrika, 98:807–820, 2011.
- S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 3:1–122, 2010.
- Paul-Christian B¨urkner. brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1):1–28, 2017.
- T. Cai and W. Liu. Adaptive thresholding for sparse covariance matrix estimation. J. Am. Statist. Ass., 106:672–684, 2011.
- T. Cai and M. Yuan.
- Adaptive covariance matrix estimation through block thresholding. Ann. Statist., 40:2014–2042, 2012.
- S. Chaudhuri, M. Drton, and T.S. Richardson. Estimation of a covariance matrix with zeros. Biometrika, 94:199–216, 2007.
- Y. Cui, C. Leng, and D. Sun. Sparse estimation of high-dimensional correlation matrices. Computational Statistics and Data Analysis, 93:390–403, 2016.
- S. Epskamp, L.J. Waldorp, R. M˜ottus, and D. Borsboom. The gaussian graphical model in cross-sectional and time-series data. Multivariate Behavioral Research, 53:453–480, 2018.
- J. Fan, Y. Liao, and H. Liu. An overview of the estimation of large covariance and precision matrices. The Econometrics Journal, 19:C1–C32, 2016. Steffen Fieuws and Geert Verbeke. Pairwise fitting of mixed models for the joint modeling of multivariate longitudinal profiles. Biometrics, 62(2):424–
- 431, 2006.
- A.J. Fisher, J.D. Medagliab, and B.F. Jeronimusd. Lack of group-to-individual generalizability is a threat to human subjects research. Proc. Natn. Acad. Sci. USA, 115:E6106–E6115, 2018.
- D.A. Freedman. Ecological inferences and the ecological fallacy. International
- Encyclopaedia of the Social and Behavioural Sciences, 6:4027–4030, 1999.
- E.L. Hamaker. Why researchers should think “within-person”: A paradigmatic rationale. In M. R. Mehl and T. S. Conner, editors, Handbook of Research
- Methods for Studying Daily Life, pages 43–61. Guilford Press, New York, 2012.
- T.J. Hastie, R.J. Tibshirani, and J.H. Friedman. The elements of statistical learning: prediction, inference and data mining. Springer, New York, 2nd edition, 2009.
- K.J.R. Ipema, J. Kuipers, R. Westerhuis, C.A.J.M. Gaillard, C.P. van der
- Schans, W.P. Krijnen, and C.F.M. Franssen. Causes and consequences of interdialytic weight gain. Kidney and Blood Pressure Research, 41:710–720, 2016.
- Celine Marielle Laffont, Marc Vandemeulebroecke, and Didier Concordet. Multivariate analysis of longitudinal ordinal data with mixed effects models, with application to clinical outcomes in osteoarthritis. Journal of the American
- Statistical Association, 109(507):955–966, 2014.
- R. Nishihara, L. Lessard, B. Recht, A. Packard, and M. I. Jordan. A general analysis of the convergence of alternating direction method. In Proceedings of the 32nd International Conference on Machine Learning, volume 37, pages
- 343–352, 2015.
- C. Ostroff. Comparing correlation based on individual-level and aggregated data.
- Journal of Applied Psychology, 78:569–582, 1993.
- S. Piantadosi, D.P. Byar, and S.B. Green. The ecological fallacy. American
- Journal of Epidemiology, 127:893–904, 1988. Jos´e Pinheiro and Douglas Bates. Mixed-effects models in S and S-PLUS. Springer science & business media, 2000.
- Poduri SRS Rao, Jack Kaplan, and William G Cochran. Estimators for the one-way random effects model with unequal error variances. Journal of the
- American Statistical Association, 76(373):89–97, 1981.
- P.S.R.S. Rao and C.E. Heckler. Multivariate one-way random effects model.
- American Journal of Mathematical and Management Sciences, 18:109–130, 1998.
- P.S.R.S. Rao and E.A. Sylvestre. Anova and minque type of estimators for the one-way random effects model. Communs Statist. Theory Meth., 13:1667–
- 1673, 1984.
- Anna C Reisetter and Patrick Breheny. Penalized linear mixed models for structured genetic data. Genetic epidemiology, 45(5):427–444, 2021.
- N. Rontsis, P. Goulart, and Y. Nakatsukasa. Efficient semidefinite programming with approximate admm. Journal of Optimization Theory and Applications, 192:292–320, 2022.
- A.J. Rothman.
- Positive definite estimators of large covariance matrices. Biometrika, 99:733–740, 2012.
- A.J. Rothman, P.J. Bickel, E. Levina, and J. Zhu. Sparse permutation invariant covariance estimation. Electron. J. Statist., 2:494–515, 2008.
- A.J. Rothman, E. Levina, and J. Zhu. Generalized thresholding of large covariance matrices. J. Am. Statist. Ass., 104:177–186, 2009.
- Stan Development Team. RStan: the R interface to Stan, 2024. URL https:
- //mc-stan.org/. R package version 2.32.6.
- L. Xue, S. Ma, and H. Zou. Positive-definite ℓ1-penalized estimation of large covariance matrices. J. Am. Statist. Ass., 107:1480–1491, 2012.
Acknowledgments
We thank Fresenius Medical Care North America for providing de-identified data
and Dr. Hanjie Zhang for discussing real data analysis. We also thank the editor,
associate editor, and two referees for constructive comments that substantially
improved an earlier draft. We have no conflicts of interest to declare.
The R codes that support and reproduce the finding of this study are openly
hosted on the Github repository: https://github.com/sunpeng52/GGM. The
hemodialysis data are available on the COVID RADx Data Hub. This research
was partially supported by the NIH grant R01DK130067.
Supplementary Materials
The online Supplementary Material includes proofs of the theoretical results,
computational details, and additional data analyses.