Statistica Sinica 30 (2020), 631-651
Abstract: Block-wise missing data are becoming increasingly common in high-dimensional biomedical, social, psychological, and environmental studies. As a result, we need efficient dimension-reduction methods for extracting important information for predictions under such data. Existing dimension-reduction methods and feature combinations are ineffective for handling block-wise missing data. We propose a factor-model imputation approach that targets block-wise missing data, and use an imputed factor regression for the dimension reduction and prediction. Specifically, we first perform screening to identify the important features. Then, we impute these features based on the factor model, and build a factor regression model to predict the response variable based on the imputed features. The proposed method utilizes the essential information from all observed data as a result of the factor structure of the model. Furthermore, the method remains efficient even when the proportion of block-wise missing is high. We show that the imputed factor regression model and its predictions are consistent under regularity conditions. We compare the proposed method with existing approaches using simulation studies, after which we apply it to data from the Alzheimer's Disease Neuroimaging Initiative. Our numerical results confirm that the proposed method outperforms existing competitive approaches.
Key words and phrases: Alzheimer's disease, Alzheimer's Disease Neuroimaging Initiative, block-wise missing data, data imputation, dimension reduction, factor model, principal component.