Back To Index Previous Article Next Article Full Text

Statistica Sinica 33 (2023), 27-53

HIGH-DIMENSIONAL FACTOR REGRESSION
FOR HETEROGENEOUS SUBPOPULATIONS

Peiyao Wang1, Quefeng Li1, Dinggang Shen2,3,4 and Yufeng Liu1

1University of North Carolina at Chapel Hill, 2ShanghaiTech University,
3Shanghai United Imaging Intelligence Co. and 4Korea University

Abstract: In modern scientific research, data heterogeneity is commonly observed owing to the abundance of complex data. We propose a factor regression model for data with heterogeneous subpopulations. The proposed model can be represented as a decomposition of heterogeneous and homogeneous terms. The heterogeneous term is driven by latent factors in different subpopulations. The homogeneous term captures common variation in the covariates and shares common regression coefficients across subpopulations. Our proposed model attains a good balance between a global model and a group-specific model. The global model ignores the data heterogeneity, while the group-specific model fits each subgroup separately. We prove the estimation and prediction consistency for our proposed estimators, and show that it has better convergence rates than those of the group-specific and global models. We show that the extra cost of estimating latent factors is asymptotically negligible and the minimax rate is still attainable. We further demonstrate the robustness of our proposed method by studying its prediction error under a misspecified group-specific model. Finally, we conduct simulation studies and analyze a data set from the Alzheimer's Disease Neuroimaging Initiative and an aggregated microarray data set to further demonstrate the competitiveness and interpretability of our proposed factor regression model.

Key words and phrases: Factor models, heterogeneity, penalized regression, prediction.

Back To Index Previous Article Next Article Full Text