Abstract

Repeated measurements are common in many fields, where random vari

ables are observed repeatedly across different subjects. Such data have an underlying

hierarchical structure, and it is of interest to learn covariance/correlation at different levels. Most existing methods for sparse covariance/correlation matrix estimation

assume independent samples. Ignoring the underlying hierarchical structure and correlation within the subject may lead to erroneous scientific conclusions. In this paper,

we propose to distinguish between the between-subject covariance structure and the

within-subject covariance structure.

In the presence of repeated measurement, this

leads to the problem of sparse and positive-definite estimation of between-subject and

within-subject covariance matrices. Our estimators are solutions to convex optimization problems that can be solved efficiently. We establish estimation error rates for the

proposed estimators and demonstrate their favorable performance through theoretical

analysis and comprehensive simulation studies. We further apply our methods to construct between-subject and within-subject covariance graphs of clinical variables from

hemodialysis patients.

Information

Preprint No.SS-2024-0279
Manuscript IDSS-2024-0279
Complete AuthorsSunpeng Duan, Guo Yu, Juntao Duan, Yuedong Wang
Corresponding AuthorsGuo Yu
Emailsguoyu@ucsb.edu

References

  1. 1562, 2012.
  2. J. Algina and H. Swaminathan. Psychometrics: Classical test theory. International Encyclopaedia of the Social and Behavioural Sciences, 19:423–430, 2015.
  3. H. Bae, S. Monti, M. Montano, M.H. Steinberg, T.T. Perls, and P. Sebastiani.
  4. Learning bayesian networks from correlated data. Scientific Reports, 6:25156, 2016.
  5. P.J. Bickel and E. Levina. Regularized estimation of large covariance matrices. Ann. Statist., 36:199–227, 2008a.
  6. P.J. Bickel and E. Levina.
  7. Covariance regularization by thresholding. Ann. Statist., 36:2577–2604, 2008b.
  8. J. Bien and R.J. Tibshirani.
  9. Sparse estimation of a covariance matrix. Biometrika, 98:807–820, 2011.
  10. S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 3:1–122, 2010.
  11. Paul-Christian B¨urkner. brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1):1–28, 2017.
  12. T. Cai and W. Liu. Adaptive thresholding for sparse covariance matrix estimation. J. Am. Statist. Ass., 106:672–684, 2011.
  13. T. Cai and M. Yuan.
  14. Adaptive covariance matrix estimation through block thresholding. Ann. Statist., 40:2014–2042, 2012.
  15. S. Chaudhuri, M. Drton, and T.S. Richardson. Estimation of a covariance matrix with zeros. Biometrika, 94:199–216, 2007.
  16. Y. Cui, C. Leng, and D. Sun. Sparse estimation of high-dimensional correlation matrices. Computational Statistics and Data Analysis, 93:390–403, 2016.
  17. S. Epskamp, L.J. Waldorp, R. M˜ottus, and D. Borsboom. The gaussian graphical model in cross-sectional and time-series data. Multivariate Behavioral Research, 53:453–480, 2018.
  18. J. Fan, Y. Liao, and H. Liu. An overview of the estimation of large covariance and precision matrices. The Econometrics Journal, 19:C1–C32, 2016. Steffen Fieuws and Geert Verbeke. Pairwise fitting of mixed models for the joint modeling of multivariate longitudinal profiles. Biometrics, 62(2):424–
  19. 431, 2006.
  20. A.J. Fisher, J.D. Medagliab, and B.F. Jeronimusd. Lack of group-to-individual generalizability is a threat to human subjects research. Proc. Natn. Acad. Sci. USA, 115:E6106–E6115, 2018.
  21. D.A. Freedman. Ecological inferences and the ecological fallacy. International
  22. Encyclopaedia of the Social and Behavioural Sciences, 6:4027–4030, 1999.
  23. E.L. Hamaker. Why researchers should think “within-person”: A paradigmatic rationale. In M. R. Mehl and T. S. Conner, editors, Handbook of Research
  24. Methods for Studying Daily Life, pages 43–61. Guilford Press, New York, 2012.
  25. T.J. Hastie, R.J. Tibshirani, and J.H. Friedman. The elements of statistical learning: prediction, inference and data mining. Springer, New York, 2nd edition, 2009.
  26. K.J.R. Ipema, J. Kuipers, R. Westerhuis, C.A.J.M. Gaillard, C.P. van der
  27. Schans, W.P. Krijnen, and C.F.M. Franssen. Causes and consequences of interdialytic weight gain. Kidney and Blood Pressure Research, 41:710–720, 2016.
  28. Celine Marielle Laffont, Marc Vandemeulebroecke, and Didier Concordet. Multivariate analysis of longitudinal ordinal data with mixed effects models, with application to clinical outcomes in osteoarthritis. Journal of the American
  29. Statistical Association, 109(507):955–966, 2014.
  30. R. Nishihara, L. Lessard, B. Recht, A. Packard, and M. I. Jordan. A general analysis of the convergence of alternating direction method. In Proceedings of the 32nd International Conference on Machine Learning, volume 37, pages
  31. 343–352, 2015.
  32. C. Ostroff. Comparing correlation based on individual-level and aggregated data.
  33. Journal of Applied Psychology, 78:569–582, 1993.
  34. S. Piantadosi, D.P. Byar, and S.B. Green. The ecological fallacy. American
  35. Journal of Epidemiology, 127:893–904, 1988. Jos´e Pinheiro and Douglas Bates. Mixed-effects models in S and S-PLUS. Springer science & business media, 2000.
  36. Poduri SRS Rao, Jack Kaplan, and William G Cochran. Estimators for the one-way random effects model with unequal error variances. Journal of the
  37. American Statistical Association, 76(373):89–97, 1981.
  38. P.S.R.S. Rao and C.E. Heckler. Multivariate one-way random effects model.
  39. American Journal of Mathematical and Management Sciences, 18:109–130, 1998.
  40. P.S.R.S. Rao and E.A. Sylvestre. Anova and minque type of estimators for the one-way random effects model. Communs Statist. Theory Meth., 13:1667–
  41. 1673, 1984.
  42. Anna C Reisetter and Patrick Breheny. Penalized linear mixed models for structured genetic data. Genetic epidemiology, 45(5):427–444, 2021.
  43. N. Rontsis, P. Goulart, and Y. Nakatsukasa. Efficient semidefinite programming with approximate admm. Journal of Optimization Theory and Applications, 192:292–320, 2022.
  44. A.J. Rothman.
  45. Positive definite estimators of large covariance matrices. Biometrika, 99:733–740, 2012.
  46. A.J. Rothman, P.J. Bickel, E. Levina, and J. Zhu. Sparse permutation invariant covariance estimation. Electron. J. Statist., 2:494–515, 2008.
  47. A.J. Rothman, E. Levina, and J. Zhu. Generalized thresholding of large covariance matrices. J. Am. Statist. Ass., 104:177–186, 2009.
  48. Stan Development Team. RStan: the R interface to Stan, 2024. URL https:
  49. //mc-stan.org/. R package version 2.32.6.
  50. L. Xue, S. Ma, and H. Zou. Positive-definite ℓ1-penalized estimation of large covariance matrices. J. Am. Statist. Ass., 107:1480–1491, 2012.

Acknowledgments

We thank Fresenius Medical Care North America for providing de-identified data

and Dr. Hanjie Zhang for discussing real data analysis. We also thank the editor,

associate editor, and two referees for constructive comments that substantially

improved an earlier draft. We have no conflicts of interest to declare.

The R codes that support and reproduce the finding of this study are openly

hosted on the Github repository: https://github.com/sunpeng52/GGM. The

hemodialysis data are available on the COVID RADx Data Hub. This research

was partially supported by the NIH grant R01DK130067.

Supplementary Materials

The online Supplementary Material includes proofs of the theoretical results,

computational details, and additional data analyses.


Supplementary materials are available for download.