Statistica Sinica 28 (2018), 1351-1370
Abstract: In this paper we introduce a modified Blum-Kiefer-Rosenblatt correlation (MBKR for short) to rank the relative importance of each predictor in ultrahigh-dimensional regressions. We advocate using the MBKR for two reasons. First, it is nonnegative and is zero if and only if two random variables are independent, indicating that the MBKR can detect nonlinear dependence. We illustrate that the sure independence screening procedure based on the MBKR (MBKR-SIS for short) is effective in detecting nonlinear effects, including interactions and heterogeneity, particularly when both continuous and discrete predictors are involved. Second, the MBKR is conceptually simple, easy to implement, and affine-invariant. It is free of tuning parameters and no iteration is required in estimation. It remains unchanged when order-preserving transformations are applied to the response or predictors, indicating that the MBKR-SIS is robust to the presence of extreme values and outliers in the observations. We show that, under mild conditions, the MBKR-SIS procedure has the sure screening and ranking consistency properties, guaranteeing that all important predictors can be retained after screening with probability approaching one. We also propose an iterative screening procedure to detect the important predictors that are marginally independent of the response variable. We demonstrate the merits of the MBKR-SIS procedure through simulations and an application to a dataset.
Key words and phrases: Blum-Kiefer-Rosenblatt correlation, feature screening, independence test, ranking consistency property, sure screening property.