Statistica Sinica 30 (2020), 175-192
Abstract: Ultrahigh-dimensional data are collected in many scientific fields where the predictor dimension is often much higher than the sample size. To effectively reduce the ultrahigh-dimensionality, many marginal screening approaches have been developed. However, existing screening methods may miss important predictors that are marginally independent of the response, or may select unimportant predictors owing to their high correlations with important predictors. Iterative screening procedures have been proposed to address this issue. However, studying their theoretical properties is not straightforward. Penalized regressions are not computationally efficient or numerically stable when the predictors are ultrahigh-dimensional. To overcome these drawbacks, a forward regression approach has been developed for linear models. However, nonlinear dependence between predictors and the response is often present in ultrahigh-dimensional problems. In this study, we extend the FR to develop a forward additive regression (FAR) method for selecting significant predictors in ultrahigh-dimensional nonparametric additive models. We establish the screening consistency for the FAR method and examine its finite-sample performance using Monte Carlo simulations. Our simulations indicate that, compared with marginal screenings, the FAR is much more effective in terms of identifying important predictors for additive models. When the predictors are highly correlated, the FAR even outperforms iterative marginal screenings, such as the iterative nonparametric independence screening. We also apply the FAR method to a real-data analysis in genetic studies.
Key words and phrases: Additive models, forward regression, screening consistency, ultrahigh-dimensionality, variable selection.