Statistica Sinica 25 (2015), 975-992
Abstract: LASSO for variable selection in linear regression has been studied by many authors. To achieve asymptotic selection consistency, it is well known that the LASSO method requires a strong irrepresentable condition. Even adding a thresholding step after LASSO is still too conservative, especially when the number of explanatory variables p is much larger than the number of observations n. Another well-known method, the sure independence screening (SIS), applies thresholding to an estimator of marginal covariate effect vector and, therefore, is not selection consistent unless the zero components of the marginal covariate effect vector are asymptotically the same as the zero components of the regression effect vector. Since the weakness of LASSO is caused by the fact that it utilizes the covariate sample covariance matrix that is not well behaved when p is larger than n, we propose a regularized LASSO (RLASSO) method for replacing the covariate sample covariance matrix in LASSO by a regularized estimator of covariate covariance matrix and adding a thresholding step. Using a regularized estimator of covariate covariance matrix, we can consistently estimate the regression effects and, hence, our method also extends and improves the SIS method that estimates marginal covariate effects. We establish selection consistency of RLASSO under conditions that the regression effect vector is sparse and the covariate covariance matrix or its inverse is sparse. Some simulation results for comparing variable selection performances of RLASSO and various other methods are presented. A data example is also provided.
Key words and phrases: High-dimensional data, LASSO, regularization, selection consistency, sparsity, thresholding.