Back To Index Previous Article Next Article Full Text

Statistica Sinica 35 (2025), 431-456

STATISTICAL INFERENCE FOR HIGH-DIMENSIONAL
LINEAR REGRESSION WITH BLOCKWISE MISSING DATA

Fei Xue, Rong Ma and Hongzhe Li*

Purdue University, Harvard University and University of Pennsylvania

Abstract: Blockwise missing data occur frequently when we integrate multisource or multimodality data, in which different sources or modalities contain complementary information. In this study, we consider a high-dimensional linear regression model with blockwise missing covariates and a partially observed response variable. Under this framework, we propose a computationally efficient estimator for the regression coefficient vector based on carefully constructed unbiased estimating equations and a blockwise imputation procedure, and obtain its rate of convergence. Furthermore, building on an innovative projected estimating equation technique that intrinsically corrects any bias in the initial estimator, we propose a nearly unbiased estimator for each individual regression coefficient, which is asymptotically normally distributed under mild conditions. Based on these debiased estimators, we construct asymptotically valid confidence intervals and statistical tests for each regression coefficient. The results of our numerical studies and an application to data from the Alzheimer's Disease Neuroimaging Initiative show that the proposed method outperforms existing methods, and benefits more from unsupervised samples than existing methods do.

Key words and phrases: Blockwise imputation, data integration, projected estimating equation.

Back To Index Previous Article Next Article Full Text