Back To Index Previous Article Next Article Full Text

Statistica Sinica 31 (2021), 1005-1026

A MODEL-AVERAGING METHOD
FOR HIGH-DIMENSIONAL REGRESSION
WITH MISSING RESPONSES AT RANDOM

Jinhan Xie1, Xiaodong Yan2 and Niansheng Tang1

1Yunnan University and 2Shandong University

Abstract: This study considers the ultrahigh-dimensional prediction problem in the presence of responses missing at random. A two-step model-averaging procedure is proposed to improve the prediction accuracy of the conditional mean of the response variable. The first step specifies several candidate models, each with low-dimensional predictors. To implement this step, a new feature-screening method is developed to distinguish between the active and inactive predictors. The method uses the multiple-imputation sure independence screening (MI-SIS) procedure, and candidate models are formed by grouping covariates with similar size MI-SIS values. The second step develops a new criterion to find the optimal weights for averaging a set of candidate models using weighted delete-one cross-validation (WDCV). Under some regularity conditions, we show that the proposed screening statistic enjoys the ranking consistency property, and that the WDCV criterion asymptotically achieves the lowest possible prediction loss. Simulation studies and an example demonstrate the proposed methodology.

Key words and phrases: High-dimensional data, missing at random, model averaging, multiple imputation, screening, weighted delete-one cross-validation.

Back To Index Previous Article Next Article Full Text