Statistica Sinica 31 (2021), 2073-2102
Abhik Ghosh and Magne Thoresen
Abstract: Applied sciences, including longitudinal and clustered studies in biomedicine, require analyses of ultrahigh-dimensional linear mixed-effects models, where we need to select important fixed-effect variables from a large pool of available candidates. However, prior studies assume that all available covariates and random-effect components are independent of the model error, which is often violated (endogeneity) in practice. In this study, we first investigate this important issue in ultrahigh-dimensional linear mixed-effects models, focusing particularly on selecting the fixed effects. We study the effects of different types of endogeneity on existing regularization methods, and prove their inconsistencies. Then, we propose a new profiled focused generalized method-of-moments (PFGMM) approach to consistently select fixed effects under "error-covariate" endogeneity, that is, in the presence of a correlation between the model error and the covariates. The proposed method is proved to be oracle consistent with probability tending to one, and works well under most other types of endogeneity too. Additionally, we propose and illustrate several consistent parameter estimators, including those of the variance components, along with variable selection using the PFGMM approach. Empirical simulations and an interesting real-data example further support the claimed utility of the proposed method.
Key words and phrases: Endogeneity, oracle variable selection, profiled focused generalized method of moments, ultrahigh-dimensional mixed effects models.