Abstract

We propose a joint mean and correlation regression model for multivariate dis

crete and (semi-)continuous response data, that simultaneously regresses the mean of each

response against a set of covariates, and the correlations between responses against a set

of similarity/distance measures. A set of joint estimating equations are formulated to construct an estimator of both the mean regression coefficients and the correlation regression

parameters. Under a general setting where the number of responses can tend to infinity, the

joint estimator is demonstrated to be consistent and asymptotically normally distributed,

with differing rates of convergence due to the mean regression coefficients being heterogeneous across responses. An iterative estimation procedure is developed to obtain parame-

ter estimates in the required (constrained) parameter space. Simulations demonstrate the

strong finite sample performance of the proposed estimator in terms of point estimation

and inference. We apply the proposed model to a count dataset of 38 Carabidae ground

beetle species sampled throughout Scotland, along with information about the environmental conditions of each site and the traits of each species. Results show the relationship

between mean abundance and environmental covariates differs across the beetle species,

and that beetle total length is important in driving the correlations between species.

Information

Preprint No.SS-2024-0109
Manuscript IDSS-2024-0109
Complete AuthorsZhi Yang Tho, Francis K. C. Hui, Tao Zou
Corresponding AuthorsZhi Yang Tho
Emailszhiyang.tho@anu.edu.au

References

  1. Anderson, T. W. (1973). Asymptotically efficient estimation of covariance matrices with linear structure. Ann. Statist. 1, 135–141.
  2. Bonat, W. H. and B. Jørgensen (2016). Multivariate covariance generalized linear models. J. R. Statist. Soc. C 65, 649–675.
  3. Chiu, T. Y. M., T. Leonard, and K.-W. Tsui (1996). The matrix-logarithmic covariance model. J. Am. Statist. Assoc. 91, 198–210.
  4. Downie, I. S., W. L. Wilson, V. J. Abernethy, D. I. McCracken, G. N. Foster, I. Ribera, K. J. Murphy, and
  5. A. Waterhouse (1999). The impact of different agricultural land-uses on epigeal spider diversity in Scotland. J. Insect Conserv. 3, 273–286.
  6. Fitzmaurice, G., N. Laird, and J. Ware (2011). Applied Longitudinal Analysis. Hoboken, New Jersey: Wiley.
  7. Hu, J., Y. Chen, C. Leng, and C. Y. Tang (2024). Applied regression analysis of correlations for correlated data. Ann. Appl. Stat. 18, 184–198.
  8. Hui, F. K. C. (2022). GEE-assisted forward regression for spatial latent variable models. Journal of Computational and Graphical Statistics 31, 1013–1024.
  9. Hui, F. K. C., K.-D. Dang, and L. Maestrini (2024). Simultaneous coefficient clustering and sparsity for multivariate mixed models. Journal of Computational and Graphical Statistics 34, 1–12.
  10. Hui, F. K. C., S. M¨uller, and A. Welsh (2023). GEE-assisted variable selection for latent variable models with multivariate binary data. J. Am. Statist. Assoc. 118, 1252–1263.
  11. Johnson, R. and D. Wichern (2013). Applied Multivariate Statistical Analysis: Pearson New International Edition. London, U.K.: Pearson Education.
  12. Liang, K.-Y. and S. L. Zeger (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22.
  13. Lipsitz, S. and G. Fitzmaurice (2008). Generalized estimation equations for longitudinal data analysis. In G. Fitzmaurice, M. Davidian, G. Verbeke, and G. Molenberghs (Eds.), Longitudinal Data Analysis, Chapter 3, pp. 43–78. Boca Raton, Florida: Chapman & Hall/CRC.
  14. M¨uller, S., J. L. Scealy, and A. H. Welsh (2013). Model selection in linear mixed models. Stat. Sci. 28, 135–167.
  15. Niku, J., F. K. C. Hui, S. Taskinen, and D. I. Warton (2021). Analyzing environmental-trait interactions in ecological communities with fourth-corner latent variable models. Environmetrics 32, e2683.
  16. Ovaskainen, O. and N. Abrego (2020). Joint Species Distribution Modelling: With Applications in R.
  17. Cambridge, U.K.: Cambridge University Press.
  18. Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation. Biometrika 86, 677–690.
  19. Qu, A., B. G. Lindsay, and B. Li (2000). Improving generalised estimating equations using quadratic inference functions. Biometrika 87, 823–836.
  20. Ribera, I., S. Dol´edec, I. S. Downie, and G. N. Foster (2001). Effect of land disturbance and stress on species traits of ground beetle assemblages. Ecology 82, 1112–1129.
  21. Tang, C., W. Zhang, and C. Leng (2019). Discrete longitudinal data modeling with a mean-correlation regression approach. Stat. Sin. 29, 853–876.
  22. Tho, Z. Y., F. K. C. Hui, and T. Zou (2024). An Ising similarity regression model for modeling multivariate binary data. Stat. Sin., http://doi.org/10.5705/ss.202024.0021.
  23. Tikhonov, G., N. Abrego, D. Dunson, and O. Ovaskainen (2017). Using joint species distribution models for evaluating how species-to-species associations depend on the environmental context. Methods Ecol. Evol. 8, 443–452.
  24. Wang, L., J. Zhou, and A. Qu (2012). Penalized generalized estimating equations for high-dimensional longitudinal data analysis. Biometrics 68, 353–360.
  25. Wang, Y.-G. and V. Carey (2003). Working correlation structure misspecification, estimation and covariate design: Implications for generalised estimating equations performance. Biometrika 90, 29–41.
  26. Warton, D. I. (2011). Regularized sandwich estimators for analysis of high-dimensional data using generalized estimating equations. Biometrics 67, 116–123.
  27. Warton, D. I., F. G. Blanchet, R. B. O’Hara, O. Ovaskainen, S. Taskinen, S. C. Walker, and F. K. C. Hui
  28. (2015). So many variables: Joint modeling in community ecology. Trends Ecol. Evol. 30, 766–779.
  29. Xue, L., S. Ma, and H. Zou (2012). Positive-definite ℓ1-penalized estimation of large covariance matrices. J. Am. Statist. Assoc. 107, 1480–1491.
  30. Xue, L., A. Qu, and J. Zhou (2010). Consistent model selection for marginal generalized additive model for correlated data. J. Am. Statist. Assoc. 105, 1518–1530.
  31. Ye, H. and J. Pan (2006). Modelling of covariance structures in generalised estimating equations for longitudinal data. Biometrika 93, 927–941.
  32. Zhang, W. and C. Leng (2012). A moving average Cholesky factor model in covariance modelling for longitudinal data. Biometrika 99, 141–150.
  33. Zhang, W., C. Leng, and C. Y. Tang (2015). A joint modelling approach for longitudinal studies. J. R. Statist. Soc. B 77, 219–238.
  34. Zhu, J. and E. P. Xing (2011). Sparse topical coding. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, UAI’11, Barcelona, Spain, pp. 831–838. AUAI Press.
  35. Zou, T., W. Lan, R. Li, and C.-L. Tsai (2022). Inference on covariance-mean regression. J. Econometrics 230, 318–338.
  36. Zou, T., W. Lan, H. Wang, and C.-L. Tsai (2017). Covariance regression analysis. J. Am. Statist. Assoc. 112, 266–281.
  37. Zou, T., R. Luo, W. Lan, and C.-L. Tsai (2020). Covariance regression model for non-normal data. In C. F. Lee and J. C. Lee (Eds.), Handbook of Financial Econometrics, Mathematics, Statistics, and Machine
  38. Learning, Volume 4, Chapter 113, pp. 3933–3945. Singapore: World Scientific.
  39. Zou, T., R. Luo, W. Lan, and C.-L. Tsai (2021). Network influence analysis. Stat. Sin. 31, 1727–1748.
  40. Zwiernik, P., C. Uhler, and D. Richards (2017). Maximum likelihood estimation for linear Gaussian covariance models. J. R. Statist. Soc. B 79, 1269–1292. Zhi Yang Tho The Australian National University, Canberra, ACT 2600, Australia.

Acknowledgments

Zhi Yang Tho was supported by an Australian Government Research Training

Program scholarship. Francis KC Hui was supported by an Australian Research

Council Discovery Project DP240100143. Tao Zou’s research was supported

by computational resources provided by the Australian Government through the

National Computational Infrastructure (NCI), under the ANU Startup Allocation

Scheme. The authors thank Alan Welsh for his helpful comments.

Supplementary Materials

The Supplementary Material includes all proofs and algorithms, as well as additional results for the simulation study and real data application.


Supplementary materials are available for download.