Asymptotic Results for Penalized Quasi-Likelihood Estimation in Generalized Linear Mixed Models

Ning, Xu; Hui, Francis; Welsh, Alan

doi:10.5705/ss.202023.0343

Abstract

Generalized Linear Mixed Models (GLMMs) are widely used for analysing

clustered data. One well-established method of overcoming the integral in the

marginal likelihood function for GLMMs is penalized quasi-likelihood (PQL) estimation, although to date there are few asymptotic distribution results relating

to PQL estimation for GLMMs in the literature.

In this paper, we establish

large sample results for PQL estimators of the parameters and random effects in

independent-cluster GLMMs, when both the number of clusters and the cluster

sizes go to infinity. This is done under two distinct regimes: conditional on the

random effects (essentially treating them as fixed effects) and unconditionally

(treating the random effects as random). Under the conditional regime, we show

the PQL estimators are asymptotically normal around the true fixed and random

effects. Unconditionally, we prove that while the estimator of the fixed effects is

asymptotically normally distributed, the correct asymptotic distribution of the

so-called prediction gap of the random effects may in fact be a normal scalemixture distribution under certain relative rates of growth. A simulation study

is used to verify the finite sample performance of our theoretical results.

Key words and phrases: Asymptotic independence, Clustered data, Large sample distribution, Longitudinal data, Prediction

Information

Preprint No.	SS-2023-0343
Manuscript ID	SS-2023-0343
Complete Authors	Xu Ning, Francis Hui, Alan Welsh
Corresponding Authors	Xu Ning
Emails	nicksonnz@hotmail.com

References

Basawa, I. V. and D. J. Scott (2012). Asymptotic optimal inference for non-ergodic models. Springer Science & Business Media.
Bates, D., M. M¨achler, B. Bolker, and S. Walker (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67(1), 1–48.
Blum, J., H. Chernoff, M. Rosenblatt, and H. Teicher (1958). Central limit theorems for interchangeable processes. Canadian Journal of Mathematics 10, 222–229.
Breslow, N. E. and D. G. Clayton (1993). Approximate inference in generalized linear mixed models. Journal of the American statistical Association 88, 9–25.
Breslow, N. E. and X. Lin (1995). Bias correction in generalised linear mixed models with a single component of dispersion. Biometrika 82, 81–91.
Brooks, M. E., K. Kristensen, K. J. van Benthem, A. Magnusson, C. W. Berg, A. Nielsen, H. J.
Skaug, M. Maechler, and B. M. Bolker (2017). glmmTMB balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling. The R Journal 9, 378–400.
Cam, L. L. and G. L. Yang (1988). On the preservation of local asymptotic normality under information loss. The Annals of Statistics 16, 483–520.
Cheng, J., L. J. Edwards, M. M. Maldonado-Molina, K. A. Komro, and K. E. Muller (2010). Real longitudinal data analysis for real people: building a good enough mixed model. Statistics in medicine 29, 504–520.
Fan, Y. and R. Li (2012). Variable selection in linear mixed effects models. Annals of statistics 40, 2043–2068.
Henderson, C. R. (1973). Sire evaluation and genetic trends. Journal of Animal Science 1973, 10–41.
Hui, F. K. C. (2020). On the use of a penalized quasilikelihood information criterion for generalized linear mixed models. Biometrika 108, 353–365.
Hui, F. K. C., S. M¨uller, and A. H. Welsh (2017). Joint selection in mixed models using regularized PQL. Journal of the American Statistical Association 112, 1323–1333.
Hui, F. K. C., S. M¨uller, and A. H. Welsh (2021). Random effects misspecification can have severe consequences for random effects inference in linear mixed models. International Statistical Review 89, 186–206.
Jiang, J. (2003). Empirical best prediction for small-area inference based on generalized linear mixed models. Journal of Statistical Planning and Inference 111(1-2), 117–127.
Jiang, J., H. Jia, and H. Chen (2001). Maximum posterior estimation of random effects in generalized linear mixed models. Statistica Sinica 11(1), 97–120.
Jiang, J., M. P. Wand, and A. Bhaskaran (2022). Usable and precise asymptotics for generalized linear mixed model analysis and design. Journal of the Royal Statistical Society: Series B 84, 55–82.
Kackar, R. N. and D. A. Harville (1984). Approximations for standard errors of estimators of fixed and random effects in mixed linear models. Journal of the American Statistical Association 79, 853–862.
Kidzi´nski, L., F. K. Hui, D. I. Warton, and T. J. Hastie (2022). Generalized matrix factorization: efficient algorithms for fitting generalized linear latent variable models to large data arrays. The Journal of Machine Learning Research 23, 13211–13239.
Lyu, Z. and A. H. Welsh (2021a). Asymptotics for EBLUPs: Nested error regression models.
Journal of the American Statistical Association 117, 1–15.
Lyu, Z. and A. H. Welsh (2021b). Increasing cluster size asymptotics for nested error regression models. Journal of Statistical Planning and Inference 217, 52–68.
McCulloch, C. E. and S. R. Searle (2004). Generalized, linear, and mixed models. John Wiley & Sons.
Nie, L. (2007). Convergence rate of MLE in generalized linear and nonlinear mixed-effects models: Theory and applications. Journal of Statistical Planning and Inference 137, 1787– 1804.
Ogden, H. (2017). On asymptotic validity of naive inference with an approximate likelihood. Biometrika 104, 153–164.
Ogden, H. (2021). On the error in Laplace approximations of high-dimensional integrals. Stat 10, e380.
Ormerod, J. T. and M. P. Wand (2012). Gaussian variational approximate inference for generalized linear mixed models. Journal of Computational and Graphical Statistics 21, 2–17.
Pfeffermann, D. (2013). New important developments in small area estimation. Statistical Science 28, 40–68.
Prasad, N. N. and J. N. Rao (1990). The estimation of the mean squared error of small-area estimators. Journal of the American statistical association 85, 163–171.
van de Geer, S. and P. M¨uller (2012). Quasi-likelihood and/or robust estimation in high dimensions. Statistical Science 27, 469–480.
Vonesh, E. F., H. Wang, L. Nie, and D. Majumdar (2002). Conditional second-order generalized estimating equations for generalized linear and nonlinear mixed-effects models. Journal of the American Statistical Association 97, 271–283. Research School of Finance, Actuarial Studies and Statistics, The Australian National University, Canberra, ACT 0200, Australia.

Acknowledgments

Xu Ning was supported by the Australian Government Research Training

Program Scholarship. Francis Hui and Alan Welsh were supported by an

Australian Research Council Discovery Project DP230101908.

Supplementary Materials

The online Supplementary Material contains proofs of our theorems and

extra simulation results.

Supplementary materials are available for download.

[1] Basawa, I. V. and D. J. Scott (2012). Asymptotic optimal inference for non-ergodic models. Springer Science & Business Media.

[2] Bates, D., M. M¨achler, B. Bolker, and S. Walker (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67(1), 1–48.

[3] Blum, J., H. Chernoff, M. Rosenblatt, and H. Teicher (1958). Central limit theorems for interchangeable processes. Canadian Journal of Mathematics 10, 222–229.

[4] Breslow, N. E. and D. G. Clayton (1993). Approximate inference in generalized linear mixed models. Journal of the American statistical Association 88, 9–25.

[5] Breslow, N. E. and X. Lin (1995). Bias correction in generalised linear mixed models with a single component of dispersion. Biometrika 82, 81–91.

[6] Brooks, M. E., K. Kristensen, K. J. van Benthem, A. Magnusson, C. W. Berg, A. Nielsen, H. J.

[7] Skaug, M. Maechler, and B. M. Bolker (2017). glmmTMB balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling. The R Journal 9, 378–400.

[8] Cam, L. L. and G. L. Yang (1988). On the preservation of local asymptotic normality under information loss. The Annals of Statistics 16, 483–520.

[9] Cheng, J., L. J. Edwards, M. M. Maldonado-Molina, K. A. Komro, and K. E. Muller (2010). Real longitudinal data analysis for real people: building a good enough mixed model. Statistics in medicine 29, 504–520.

[10] Fan, Y. and R. Li (2012). Variable selection in linear mixed effects models. Annals of statistics 40, 2043–2068.

[11] Henderson, C. R. (1973). Sire evaluation and genetic trends. Journal of Animal Science 1973, 10–41.

[12] Hui, F. K. C. (2020). On the use of a penalized quasilikelihood information criterion for generalized linear mixed models. Biometrika 108, 353–365.

[13] Hui, F. K. C., S. M¨uller, and A. H. Welsh (2017). Joint selection in mixed models using regularized PQL. Journal of the American Statistical Association 112, 1323–1333.

[14] Hui, F. K. C., S. M¨uller, and A. H. Welsh (2021). Random effects misspecification can have severe consequences for random effects inference in linear mixed models. International Statistical Review 89, 186–206.

[15] Jiang, J. (2003). Empirical best prediction for small-area inference based on generalized linear mixed models. Journal of Statistical Planning and Inference 111(1-2), 117–127.

[16] Jiang, J., H. Jia, and H. Chen (2001). Maximum posterior estimation of random effects in generalized linear mixed models. Statistica Sinica 11(1), 97–120.

[17] Jiang, J., M. P. Wand, and A. Bhaskaran (2022). Usable and precise asymptotics for generalized linear mixed model analysis and design. Journal of the Royal Statistical Society: Series B 84, 55–82.

[18] Kackar, R. N. and D. A. Harville (1984). Approximations for standard errors of estimators of fixed and random effects in mixed linear models. Journal of the American Statistical Association 79, 853–862.

[19] Kidzi´nski, L., F. K. Hui, D. I. Warton, and T. J. Hastie (2022). Generalized matrix factorization: efficient algorithms for fitting generalized linear latent variable models to large data arrays. The Journal of Machine Learning Research 23, 13211–13239.

[20] Lyu, Z. and A. H. Welsh (2021a). Asymptotics for EBLUPs: Nested error regression models.

[21] Journal of the American Statistical Association 117, 1–15.

[22] Lyu, Z. and A. H. Welsh (2021b). Increasing cluster size asymptotics for nested error regression models. Journal of Statistical Planning and Inference 217, 52–68.

[23] McCulloch, C. E. and S. R. Searle (2004). Generalized, linear, and mixed models. John Wiley & Sons.

[24] Nie, L. (2007). Convergence rate of MLE in generalized linear and nonlinear mixed-effects models: Theory and applications. Journal of Statistical Planning and Inference 137, 1787– 1804.

[25] Ogden, H. (2017). On asymptotic validity of naive inference with an approximate likelihood. Biometrika 104, 153–164.

[26] Ogden, H. (2021). On the error in Laplace approximations of high-dimensional integrals. Stat 10, e380.

[27] Ormerod, J. T. and M. P. Wand (2012). Gaussian variational approximate inference for generalized linear mixed models. Journal of Computational and Graphical Statistics 21, 2–17.

[28] Pfeffermann, D. (2013). New important developments in small area estimation. Statistical Science 28, 40–68.

[29] Prasad, N. N. and J. N. Rao (1990). The estimation of the mean squared error of small-area estimators. Journal of the American statistical association 85, 163–171.

[30] van de Geer, S. and P. M¨uller (2012). Quasi-likelihood and/or robust estimation in high dimensions. Statistical Science 27, 469–480.

[31] Vonesh, E. F., H. Wang, L. Nie, and D. Majumdar (2002). Conditional second-order generalized estimating equations for generalized linear and nonlinear mixed-effects models. Journal of the American Statistical Association 97, 271–283. Research School of Finance, Actuarial Studies and Statistics, The Australian National University, Canberra, ACT 0200, Australia.