Estimating Covariance Matrices at Different Levels in Repeated Measurements

Sunpeng Duan, Guo Yu, Juntao Duan and Yuedong Wang

doi:10.5705/ss.202024.0279

Abstract

Repeated measurements are common in many fields, where random vari

ables are observed repeatedly across different subjects. Such data have an underlying

hierarchical structure, and it is of interest to learn covariance/correlation at different levels. Most existing methods for sparse covariance/correlation matrix estimation

assume independent samples. Ignoring the underlying hierarchical structure and correlation within the subject may lead to erroneous scientific conclusions. In this paper,

we propose to distinguish between the between-subject covariance structure and the

within-subject covariance structure.

In the presence of repeated measurement, this

leads to the problem of sparse and positive-definite estimation of between-subject and

within-subject covariance matrices. Our estimators are solutions to convex optimization problems that can be solved efficiently. We establish estimation error rates for the

proposed estimators and demonstrate their favorable performance through theoretical

analysis and comprehensive simulation studies. We further apply our methods to construct between-subject and within-subject covariance graphs of clinical variables from

hemodialysis patients.

Key words and phrases: Covariance graph; repeated measurements; ecological fallacy; random effect; sparsity

Information

Preprint No.	SS-2024-0279
Manuscript ID	SS-2024-0279
Complete Authors	Sunpeng Duan, Guo Yu, Juntao Duan, Yuedong Wang
Corresponding Authors	Guo Yu
Emails	guoyu@ucsb.edu

References

1562, 2012.
J. Algina and H. Swaminathan. Psychometrics: Classical test theory. International Encyclopaedia of the Social and Behavioural Sciences, 19:423–430, 2015.
H. Bae, S. Monti, M. Montano, M.H. Steinberg, T.T. Perls, and P. Sebastiani.
Learning bayesian networks from correlated data. Scientific Reports, 6:25156, 2016.
P.J. Bickel and E. Levina. Regularized estimation of large covariance matrices. Ann. Statist., 36:199–227, 2008a.
P.J. Bickel and E. Levina.
Covariance regularization by thresholding. Ann. Statist., 36:2577–2604, 2008b.
J. Bien and R.J. Tibshirani.
Sparse estimation of a covariance matrix. Biometrika, 98:807–820, 2011.
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 3:1–122, 2010.
Paul-Christian B¨urkner. brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1):1–28, 2017.
T. Cai and W. Liu. Adaptive thresholding for sparse covariance matrix estimation. J. Am. Statist. Ass., 106:672–684, 2011.
T. Cai and M. Yuan.
Adaptive covariance matrix estimation through block thresholding. Ann. Statist., 40:2014–2042, 2012.
S. Chaudhuri, M. Drton, and T.S. Richardson. Estimation of a covariance matrix with zeros. Biometrika, 94:199–216, 2007.
Y. Cui, C. Leng, and D. Sun. Sparse estimation of high-dimensional correlation matrices. Computational Statistics and Data Analysis, 93:390–403, 2016.
S. Epskamp, L.J. Waldorp, R. M˜ottus, and D. Borsboom. The gaussian graphical model in cross-sectional and time-series data. Multivariate Behavioral Research, 53:453–480, 2018.
J. Fan, Y. Liao, and H. Liu. An overview of the estimation of large covariance and precision matrices. The Econometrics Journal, 19:C1–C32, 2016. Steffen Fieuws and Geert Verbeke. Pairwise fitting of mixed models for the joint modeling of multivariate longitudinal profiles. Biometrics, 62(2):424–
431, 2006.
A.J. Fisher, J.D. Medagliab, and B.F. Jeronimusd. Lack of group-to-individual generalizability is a threat to human subjects research. Proc. Natn. Acad. Sci. USA, 115:E6106–E6115, 2018.
D.A. Freedman. Ecological inferences and the ecological fallacy. International
Encyclopaedia of the Social and Behavioural Sciences, 6:4027–4030, 1999.
E.L. Hamaker. Why researchers should think “within-person”: A paradigmatic rationale. In M. R. Mehl and T. S. Conner, editors, Handbook of Research
Methods for Studying Daily Life, pages 43–61. Guilford Press, New York, 2012.
T.J. Hastie, R.J. Tibshirani, and J.H. Friedman. The elements of statistical learning: prediction, inference and data mining. Springer, New York, 2nd edition, 2009.
K.J.R. Ipema, J. Kuipers, R. Westerhuis, C.A.J.M. Gaillard, C.P. van der
Schans, W.P. Krijnen, and C.F.M. Franssen. Causes and consequences of interdialytic weight gain. Kidney and Blood Pressure Research, 41:710–720, 2016.
Celine Marielle Laffont, Marc Vandemeulebroecke, and Didier Concordet. Multivariate analysis of longitudinal ordinal data with mixed effects models, with application to clinical outcomes in osteoarthritis. Journal of the American
Statistical Association, 109(507):955–966, 2014.
R. Nishihara, L. Lessard, B. Recht, A. Packard, and M. I. Jordan. A general analysis of the convergence of alternating direction method. In Proceedings of the 32nd International Conference on Machine Learning, volume 37, pages
343–352, 2015.
C. Ostroff. Comparing correlation based on individual-level and aggregated data.
Journal of Applied Psychology, 78:569–582, 1993.
S. Piantadosi, D.P. Byar, and S.B. Green. The ecological fallacy. American
Journal of Epidemiology, 127:893–904, 1988. Jos´e Pinheiro and Douglas Bates. Mixed-effects models in S and S-PLUS. Springer science & business media, 2000.
Poduri SRS Rao, Jack Kaplan, and William G Cochran. Estimators for the one-way random effects model with unequal error variances. Journal of the
American Statistical Association, 76(373):89–97, 1981.
P.S.R.S. Rao and C.E. Heckler. Multivariate one-way random effects model.
American Journal of Mathematical and Management Sciences, 18:109–130, 1998.
P.S.R.S. Rao and E.A. Sylvestre. Anova and minque type of estimators for the one-way random effects model. Communs Statist. Theory Meth., 13:1667–
1673, 1984.
Anna C Reisetter and Patrick Breheny. Penalized linear mixed models for structured genetic data. Genetic epidemiology, 45(5):427–444, 2021.
N. Rontsis, P. Goulart, and Y. Nakatsukasa. Efficient semidefinite programming with approximate admm. Journal of Optimization Theory and Applications, 192:292–320, 2022.
A.J. Rothman.
Positive definite estimators of large covariance matrices. Biometrika, 99:733–740, 2012.
A.J. Rothman, P.J. Bickel, E. Levina, and J. Zhu. Sparse permutation invariant covariance estimation. Electron. J. Statist., 2:494–515, 2008.
A.J. Rothman, E. Levina, and J. Zhu. Generalized thresholding of large covariance matrices. J. Am. Statist. Ass., 104:177–186, 2009.
Stan Development Team. RStan: the R interface to Stan, 2024. URL https:
//mc-stan.org/. R package version 2.32.6.
L. Xue, S. Ma, and H. Zou. Positive-definite ℓ1-penalized estimation of large covariance matrices. J. Am. Statist. Ass., 107:1480–1491, 2012.

Acknowledgments

We thank Fresenius Medical Care North America for providing de-identified data

and Dr. Hanjie Zhang for discussing real data analysis. We also thank the editor,

associate editor, and two referees for constructive comments that substantially

improved an earlier draft. We have no conflicts of interest to declare.

The R codes that support and reproduce the finding of this study are openly

hosted on the Github repository: https://github.com/sunpeng52/GGM. The

hemodialysis data are available on the COVID RADx Data Hub. This research

was partially supported by the NIH grant R01DK130067.

Supplementary Materials

The online Supplementary Material includes proofs of the theoretical results,

computational details, and additional data analyses.

Supplementary materials are available for download.

[1] 1562, 2012.

[2] J. Algina and H. Swaminathan. Psychometrics: Classical test theory. International Encyclopaedia of the Social and Behavioural Sciences, 19:423–430, 2015.

[3] H. Bae, S. Monti, M. Montano, M.H. Steinberg, T.T. Perls, and P. Sebastiani.

[4] Learning bayesian networks from correlated data. Scientific Reports, 6:25156, 2016.

[5] P.J. Bickel and E. Levina. Regularized estimation of large covariance matrices. Ann. Statist., 36:199–227, 2008a.

[6] P.J. Bickel and E. Levina.

[7] Covariance regularization by thresholding. Ann. Statist., 36:2577–2604, 2008b.

[8] J. Bien and R.J. Tibshirani.

[9] Sparse estimation of a covariance matrix. Biometrika, 98:807–820, 2011.

[10] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 3:1–122, 2010.

[11] Paul-Christian B¨urkner. brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1):1–28, 2017.

[12] T. Cai and W. Liu. Adaptive thresholding for sparse covariance matrix estimation. J. Am. Statist. Ass., 106:672–684, 2011.

[13] T. Cai and M. Yuan.

[14] Adaptive covariance matrix estimation through block thresholding. Ann. Statist., 40:2014–2042, 2012.

[15] S. Chaudhuri, M. Drton, and T.S. Richardson. Estimation of a covariance matrix with zeros. Biometrika, 94:199–216, 2007.

[16] Y. Cui, C. Leng, and D. Sun. Sparse estimation of high-dimensional correlation matrices. Computational Statistics and Data Analysis, 93:390–403, 2016.

[17] S. Epskamp, L.J. Waldorp, R. M˜ottus, and D. Borsboom. The gaussian graphical model in cross-sectional and time-series data. Multivariate Behavioral Research, 53:453–480, 2018.

[18] J. Fan, Y. Liao, and H. Liu. An overview of the estimation of large covariance and precision matrices. The Econometrics Journal, 19:C1–C32, 2016. Steffen Fieuws and Geert Verbeke. Pairwise fitting of mixed models for the joint modeling of multivariate longitudinal profiles. Biometrics, 62(2):424–

[19] 431, 2006.

[20] A.J. Fisher, J.D. Medagliab, and B.F. Jeronimusd. Lack of group-to-individual generalizability is a threat to human subjects research. Proc. Natn. Acad. Sci. USA, 115:E6106–E6115, 2018.

[21] D.A. Freedman. Ecological inferences and the ecological fallacy. International

[22] Encyclopaedia of the Social and Behavioural Sciences, 6:4027–4030, 1999.

[23] E.L. Hamaker. Why researchers should think “within-person”: A paradigmatic rationale. In M. R. Mehl and T. S. Conner, editors, Handbook of Research

[24] Methods for Studying Daily Life, pages 43–61. Guilford Press, New York, 2012.

[25] T.J. Hastie, R.J. Tibshirani, and J.H. Friedman. The elements of statistical learning: prediction, inference and data mining. Springer, New York, 2nd edition, 2009.

[26] K.J.R. Ipema, J. Kuipers, R. Westerhuis, C.A.J.M. Gaillard, C.P. van der

[27] Schans, W.P. Krijnen, and C.F.M. Franssen. Causes and consequences of interdialytic weight gain. Kidney and Blood Pressure Research, 41:710–720, 2016.

[28] Celine Marielle Laffont, Marc Vandemeulebroecke, and Didier Concordet. Multivariate analysis of longitudinal ordinal data with mixed effects models, with application to clinical outcomes in osteoarthritis. Journal of the American

[29] Statistical Association, 109(507):955–966, 2014.

[30] R. Nishihara, L. Lessard, B. Recht, A. Packard, and M. I. Jordan. A general analysis of the convergence of alternating direction method. In Proceedings of the 32nd International Conference on Machine Learning, volume 37, pages

[31] 343–352, 2015.

[32] C. Ostroff. Comparing correlation based on individual-level and aggregated data.

[33] Journal of Applied Psychology, 78:569–582, 1993.

[34] S. Piantadosi, D.P. Byar, and S.B. Green. The ecological fallacy. American

[35] Journal of Epidemiology, 127:893–904, 1988. Jos´e Pinheiro and Douglas Bates. Mixed-effects models in S and S-PLUS. Springer science & business media, 2000.

[36] Poduri SRS Rao, Jack Kaplan, and William G Cochran. Estimators for the one-way random effects model with unequal error variances. Journal of the

[37] American Statistical Association, 76(373):89–97, 1981.

[38] P.S.R.S. Rao and C.E. Heckler. Multivariate one-way random effects model.

[39] American Journal of Mathematical and Management Sciences, 18:109–130, 1998.

[40] P.S.R.S. Rao and E.A. Sylvestre. Anova and minque type of estimators for the one-way random effects model. Communs Statist. Theory Meth., 13:1667–

[41] 1673, 1984.

[42] Anna C Reisetter and Patrick Breheny. Penalized linear mixed models for structured genetic data. Genetic epidemiology, 45(5):427–444, 2021.

[43] N. Rontsis, P. Goulart, and Y. Nakatsukasa. Efficient semidefinite programming with approximate admm. Journal of Optimization Theory and Applications, 192:292–320, 2022.

[44] A.J. Rothman.

[45] Positive definite estimators of large covariance matrices. Biometrika, 99:733–740, 2012.

[46] A.J. Rothman, P.J. Bickel, E. Levina, and J. Zhu. Sparse permutation invariant covariance estimation. Electron. J. Statist., 2:494–515, 2008.

[47] A.J. Rothman, E. Levina, and J. Zhu. Generalized thresholding of large covariance matrices. J. Am. Statist. Ass., 104:177–186, 2009.

[48] Stan Development Team. RStan: the R interface to Stan, 2024. URL https:

[49] //mc-stan.org/. R package version 2.32.6.

[50] L. Xue, S. Ma, and H. Zou. Positive-definite ℓ1-penalized estimation of large covariance matrices. J. Am. Statist. Ass., 107:1480–1491, 2012.