Abstract
We propose a joint mean and correlation regression model for multivariate dis
crete and (semi-)continuous response data, that simultaneously regresses the mean of each
response against a set of covariates, and the correlations between responses against a set
of similarity/distance measures. A set of joint estimating equations are formulated to construct an estimator of both the mean regression coefficients and the correlation regression
parameters. Under a general setting where the number of responses can tend to infinity, the
joint estimator is demonstrated to be consistent and asymptotically normally distributed,
with differing rates of convergence due to the mean regression coefficients being heterogeneous across responses. An iterative estimation procedure is developed to obtain parame-
ter estimates in the required (constrained) parameter space. Simulations demonstrate the
strong finite sample performance of the proposed estimator in terms of point estimation
and inference. We apply the proposed model to a count dataset of 38 Carabidae ground
beetle species sampled throughout Scotland, along with information about the environmental conditions of each site and the traits of each species. Results show the relationship
between mean abundance and environmental covariates differs across the beetle species,
and that beetle total length is important in driving the correlations between species.
Information
| Preprint No. | SS-2024-0109 |
|---|---|
| Manuscript ID | SS-2024-0109 |
| Complete Authors | Zhi Yang Tho, Francis K. C. Hui, Tao Zou |
| Corresponding Authors | Zhi Yang Tho |
| Emails | zhiyang.tho@anu.edu.au |
References
- Anderson, T. W. (1973). Asymptotically efficient estimation of covariance matrices with linear structure. Ann. Statist. 1, 135–141.
- Bonat, W. H. and B. Jørgensen (2016). Multivariate covariance generalized linear models. J. R. Statist. Soc. C 65, 649–675.
- Chiu, T. Y. M., T. Leonard, and K.-W. Tsui (1996). The matrix-logarithmic covariance model. J. Am. Statist. Assoc. 91, 198–210.
- Downie, I. S., W. L. Wilson, V. J. Abernethy, D. I. McCracken, G. N. Foster, I. Ribera, K. J. Murphy, and
- A. Waterhouse (1999). The impact of different agricultural land-uses on epigeal spider diversity in Scotland. J. Insect Conserv. 3, 273–286.
- Fitzmaurice, G., N. Laird, and J. Ware (2011). Applied Longitudinal Analysis. Hoboken, New Jersey: Wiley.
- Hu, J., Y. Chen, C. Leng, and C. Y. Tang (2024). Applied regression analysis of correlations for correlated data. Ann. Appl. Stat. 18, 184–198.
- Hui, F. K. C. (2022). GEE-assisted forward regression for spatial latent variable models. Journal of Computational and Graphical Statistics 31, 1013–1024.
- Hui, F. K. C., K.-D. Dang, and L. Maestrini (2024). Simultaneous coefficient clustering and sparsity for multivariate mixed models. Journal of Computational and Graphical Statistics 34, 1–12.
- Hui, F. K. C., S. M¨uller, and A. Welsh (2023). GEE-assisted variable selection for latent variable models with multivariate binary data. J. Am. Statist. Assoc. 118, 1252–1263.
- Johnson, R. and D. Wichern (2013). Applied Multivariate Statistical Analysis: Pearson New International Edition. London, U.K.: Pearson Education.
- Liang, K.-Y. and S. L. Zeger (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22.
- Lipsitz, S. and G. Fitzmaurice (2008). Generalized estimation equations for longitudinal data analysis. In G. Fitzmaurice, M. Davidian, G. Verbeke, and G. Molenberghs (Eds.), Longitudinal Data Analysis, Chapter 3, pp. 43–78. Boca Raton, Florida: Chapman & Hall/CRC.
- M¨uller, S., J. L. Scealy, and A. H. Welsh (2013). Model selection in linear mixed models. Stat. Sci. 28, 135–167.
- Niku, J., F. K. C. Hui, S. Taskinen, and D. I. Warton (2021). Analyzing environmental-trait interactions in ecological communities with fourth-corner latent variable models. Environmetrics 32, e2683.
- Ovaskainen, O. and N. Abrego (2020). Joint Species Distribution Modelling: With Applications in R.
- Cambridge, U.K.: Cambridge University Press.
- Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation. Biometrika 86, 677–690.
- Qu, A., B. G. Lindsay, and B. Li (2000). Improving generalised estimating equations using quadratic inference functions. Biometrika 87, 823–836.
- Ribera, I., S. Dol´edec, I. S. Downie, and G. N. Foster (2001). Effect of land disturbance and stress on species traits of ground beetle assemblages. Ecology 82, 1112–1129.
- Tang, C., W. Zhang, and C. Leng (2019). Discrete longitudinal data modeling with a mean-correlation regression approach. Stat. Sin. 29, 853–876.
- Tho, Z. Y., F. K. C. Hui, and T. Zou (2024). An Ising similarity regression model for modeling multivariate binary data. Stat. Sin., http://doi.org/10.5705/ss.202024.0021.
- Tikhonov, G., N. Abrego, D. Dunson, and O. Ovaskainen (2017). Using joint species distribution models for evaluating how species-to-species associations depend on the environmental context. Methods Ecol. Evol. 8, 443–452.
- Wang, L., J. Zhou, and A. Qu (2012). Penalized generalized estimating equations for high-dimensional longitudinal data analysis. Biometrics 68, 353–360.
- Wang, Y.-G. and V. Carey (2003). Working correlation structure misspecification, estimation and covariate design: Implications for generalised estimating equations performance. Biometrika 90, 29–41.
- Warton, D. I. (2011). Regularized sandwich estimators for analysis of high-dimensional data using generalized estimating equations. Biometrics 67, 116–123.
- Warton, D. I., F. G. Blanchet, R. B. O’Hara, O. Ovaskainen, S. Taskinen, S. C. Walker, and F. K. C. Hui
- (2015). So many variables: Joint modeling in community ecology. Trends Ecol. Evol. 30, 766–779.
- Xue, L., S. Ma, and H. Zou (2012). Positive-definite ℓ1-penalized estimation of large covariance matrices. J. Am. Statist. Assoc. 107, 1480–1491.
- Xue, L., A. Qu, and J. Zhou (2010). Consistent model selection for marginal generalized additive model for correlated data. J. Am. Statist. Assoc. 105, 1518–1530.
- Ye, H. and J. Pan (2006). Modelling of covariance structures in generalised estimating equations for longitudinal data. Biometrika 93, 927–941.
- Zhang, W. and C. Leng (2012). A moving average Cholesky factor model in covariance modelling for longitudinal data. Biometrika 99, 141–150.
- Zhang, W., C. Leng, and C. Y. Tang (2015). A joint modelling approach for longitudinal studies. J. R. Statist. Soc. B 77, 219–238.
- Zhu, J. and E. P. Xing (2011). Sparse topical coding. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, UAI’11, Barcelona, Spain, pp. 831–838. AUAI Press.
- Zou, T., W. Lan, R. Li, and C.-L. Tsai (2022). Inference on covariance-mean regression. J. Econometrics 230, 318–338.
- Zou, T., W. Lan, H. Wang, and C.-L. Tsai (2017). Covariance regression analysis. J. Am. Statist. Assoc. 112, 266–281.
- Zou, T., R. Luo, W. Lan, and C.-L. Tsai (2020). Covariance regression model for non-normal data. In C. F. Lee and J. C. Lee (Eds.), Handbook of Financial Econometrics, Mathematics, Statistics, and Machine
- Learning, Volume 4, Chapter 113, pp. 3933–3945. Singapore: World Scientific.
- Zou, T., R. Luo, W. Lan, and C.-L. Tsai (2021). Network influence analysis. Stat. Sin. 31, 1727–1748.
- Zwiernik, P., C. Uhler, and D. Richards (2017). Maximum likelihood estimation for linear Gaussian covariance models. J. R. Statist. Soc. B 79, 1269–1292. Zhi Yang Tho The Australian National University, Canberra, ACT 2600, Australia.
Acknowledgments
Zhi Yang Tho was supported by an Australian Government Research Training
Program scholarship. Francis KC Hui was supported by an Australian Research
Council Discovery Project DP240100143. Tao Zou’s research was supported
by computational resources provided by the Australian Government through the
National Computational Infrastructure (NCI), under the ANU Startup Allocation
Scheme. The authors thank Alan Welsh for his helpful comments.
Supplementary Materials
The Supplementary Material includes all proofs and algorithms, as well as additional results for the simulation study and real data application.