Statistica Sinica 33 (2023), 1169-1191
Linli Xia1,2 and Niansheng Tang1
Abstract: This study examines the feature screening problem for ultrahigh-dimensional data with responses missing at random. A two-step procedure is proposed to screen important features. The first step screens the significant covariates associated with the missing indicators via the fused mean-variance filter. The second step screens the important predictors associated with the response by fusing the distance correlation and a nonparametric imputation technique. The proposed feature screening procedure has the following merits: (i) it is model free, because it does not depend on a special model structure or distribution assumption; (ii) it avoids resampling on the conditional function of the missing value because a kernel smoothing technique is adopted to implement the nonparametric conditional mean imputation; (iii) it is not sensitive to a misspecification of the propensity score function because it does not impose a special model on the respondent probability. Under some regularity conditions, the sure screening property is shown. A modified maximum ratio criterion is proposed to select the tuning parameter. Simulation studies are conducted to investigate the finite-sample performance of the proposed feature screening procedure. Finally, an example is used to illustrate the proposed methodologies.
Key words and phrases: Distance correlation, missing at random, nonparametric imputation, sure screening property, ultrahigh dimensional data.