Statistica Sinica 35 (2025), 431-456
Abstract: Blockwise missing data occur frequently when we integrate multisource or multimodality data, in which different sources or modalities contain complementary information. In this study, we consider a high-dimensional linear regression model with blockwise missing covariates and a partially observed response variable. Under this framework, we propose a computationally efficient estimator for the regression coefficient vector based on carefully constructed unbiased estimating equations and a blockwise imputation procedure, and obtain its rate of convergence. Furthermore, building on an innovative projected estimating equation technique that intrinsically corrects any bias in the initial estimator, we propose a nearly unbiased estimator for each individual regression coefficient, which is asymptotically normally distributed under mild conditions. Based on these debiased estimators, we construct asymptotically valid confidence intervals and statistical tests for each regression coefficient. The results of our numerical studies and an application to data from the Alzheimer's Disease Neuroimaging Initiative show that the proposed method outperforms existing methods, and benefits more from unsupervised samples than existing methods do.
Key words and phrases: Blockwise imputation, data integration, projected estimating equation.