Nonconvex Penalised Regression and Post-Selection Least Squares Estimation under High Dimensions: a Local Asymptotic Perspective

Xiaoya Xu and Stephen M. S. Lee

doi:10.5705/ss.202024.0412

Abstract

In the realm of high-dimensional linear regression, nonconvex penalised estimators have

enjoyed increasing popularity due to their much acclaimed oracle property, which holds

under assumptions weaker than those typically required for convex penalised estimators to

enjoy the same property. However, validity of such oracle property of nonconvex penalisation and the accompanying inference tools is questionable in the presence of many weak

signals and/or a few moderate signals, which may incur substantial biases. To address

this issue, we first provide a more holistic assessment of the selection and convergence

properties of nonconvex penalised estimators from a local asymptotic perspective, under a

framework which accommodates existence of many weak signals and heavy tail conditions

on covariates and random errors. We then show that post-selection least squares estimation

has the beneficial effect of removing the bias incurred by nonconvex penalisation of moderate signals. Post-selection least squares estimators acquire convergence properties more

desirable than nonconvex penalised estimators and, in the case of multiple solutions to the

nonconvex optimisation program, are ratewise more robust against the choice of selected

sets. Empirical results obtained from large-scale simulation studies corroborate our theoretical findings. In particular, the post-selection least squares method is found to improve

on nonconvex penalised estimation, especially under heavy-tailed settings.

Key words and phrases: High-dimension; Local asymptotics; nonconvex penalised regres- sion; SCAD; Post-selection least squares estimation; Weak sparsity

Information

Preprint No.	SS-2024-0412
Manuscript ID	SS-2024-0412
Complete Authors	Xiaoya Xu, Stephen M. S. Lee
Corresponding Authors	Xiaoya Xu
Emails	xuxiaoya@connect.hku.hk

References

Antoniadis, A. and J. Fan (2001). Regularization of wavelet approximations. Journal of the American Statistical Association 96(455), 939–967.
Belloni, A., D. Chen, V. Chernozhukov, and C. Hansen (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80(6), 2369–2429.
Belloni, A. and V. Chernozhukov (2013). Least squares after model selection in high-dimensional sparse models. Bernoulli 19(2), 521 – 547.
Bonaccolto, G. (2021). Quantile–based portfolios: post–model–selection estimation with alternative specifications. Computational Management Science 18(3), 355–383.
Fan, J. and R. Li (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96(456), 1348–1360.
Fan, J. and J. Lv (2011). Nonconcave penalized likelihood with npdimensionality. IEEE Transactions on Information Theory 57(8), 5467–5484.
Fousekis, P. and V. Grigoriadis (2022). Conditional tail price risk spillovers in coffee markets across quality, physical space, and time: Empirical analysis with penalized quantile regressions. Economic Modelling 106, 105691.
Greenshtein, E. and Y. Ritov (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10(6), 971–988.
Horowitz, J. L. and J. Huang (2013). Penalized estimation of high-dimensional models under a generalized sparsity condition. Statistica Sinica 23(2), 725– 748.
Javanmard, A. and J. D. Lee (2020, 05). A flexible framework for hypothesis testing in high dimensions. Journal of the Royal Statistical Society Series B: Statistical Methodology 82(3), 685–718.
Javanmard, A. and A. Montanari (2013). Model selection for high-dimensional regression under the generalized irrepresentability condition. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger (Eds.), Advances in Neural Information Processing Systems, Volume 26. Curran Associates, Inc.
Kim, Y., J.-J. Jeon, and S. Han (2016). A necessary condition for the strong oracle property. Scandinavian Journal of Statistics 43(2), 610–624.
Kim, Y. and S. Kwon (2012). Global optimality of nonconvex penalized estimators. Biometrika 99(2), 315–325.
Leeb, H. and B. M. P¨otscher (2008). Sparse estimators and the oracle property, or the return of hodges’ estimator. Journal of Econometrics 142(1), 201–211.
Liu, H., X. Xu, and J. J. Li (2020). A bootstrap lasso + partial ridge method to construct confidence intervals for parameters in high-dimensional sparse linear models. Statistica Sinica 30(3), 1333–1355.
Lockhart, R., J. Taylor, R. J. Tibshirani, and R. Tibshirani (2014). A significance test for the lasso. The Annals of Statistics 42(2), 413–468.
Loh, P.-L. and M. J. Wainwright (2015). Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima. The Journal of Machine Learning Research 16(1), 559–616.
Loh, P.-L. and M. J. Wainwright (2017). Support recovery without incoherence: A case for nonconvex regularization. The Annals of Statistics 45(6), 2455– 2482.
P¨otscher, B. M. and H. Leeb (2009). On the distribution of penalized maximum likelihood estimators: The lasso, scad, and thresholding. Journal of Multivariate Analysis 100(9), 2065–2082.
Qu, A. and P. Shi (2017). Weak signal identification and inference in penalized model selection. The Annals of Statistics 45(3), 1214–1253.
Ratnasingam, S. and W. Ning (2021). Sequential change point detection for high-dimensional data using nonconvex penalized quantile regression. Biometrical Journal 63(3), 575–598.
Shao, J. and X. Deng (2012). Estimation in high-dimensional linear models with deterministic design matrices. The Annals of Statistics 40(2), 812–831.
Uematsu, Y. and S. Tanaka (2019). High-dimensional macroeconomic forecasting and variable selection via penalized regression. The Econometrics Journal 22(1), 34–56.
Xiao, H. and Y. Sun (2019). On tuning parameter selection in model selection and model averaging: A monte carlo study. Journal of Risk and Financial Management 12(3), 109.
Xiao, H. and Y. Sun (2020). Forecasting the returns of cryptocurrency: A model averaging approach. Journal of Risk and Financial Management 13(11), 278.
Yu, G., L. Yin, S. Lu, and Y. Liu (2020). Confidence intervals for sparse penalized regression with random designs. Journal of the American Statistical Association 115(530), 794–809.
Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of statistics 38(2), 894–942.
Zhang, C.-H. and S. S. Zhang (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76(1), 217–242.
Zhang, C.-H. and T. Zhang (2012). A general theory of concave regularization for high-dimensional sparse estimation problems. Statistical Science 27(4), 576–593.
Zhao, T., H. Liu, and T. Zhang (2018). Pathwise coordinate optimization for sparse learning: Algorithm and theory. The Annals of Statistics 46(1), 180– 218.

Acknowledgments

The authors are grateful to the Co-Editor Dr. Yi-Hau Chen, an associate editor,

and two referees for their insightful comments that greatly improved the article

and Ho Koon for the assistance. The work by Xiaoya Xu was supported by Shenzhen Polytechnic University [Project No. 6025310026K]. The work by Stephen

M.S. Lee was supported by the General Research Fund [Grant No. 17307321].

Supplementary Materials

By including an additional heavy-tailed setting, we generalise in Appendix 1

the theoretical results contained in Sections 2.3 and 3, with technical proofs

given in Appendix 2. From a predictive perspective, it may be of interest to

draw inference about the effects of strong signals after adjusting for the omission

of weak signals under a weakly sparse model. Define an “oracle” target to be

θ0 = argmin

θ∈Rp

E(Y −X ⊤θ)2 : θAc

0 = 0

, which can be interpreted as the effects

of strong signals in A0 adjusted for the omission of weak signals in Ac

Section S.1.3 of Appendix 1, we revisit the asymptotic properties of both the

nonconvex penalised estimator and its post-selection counterpart when applied

to estimate adjusted effects of strong signals for the purpose of prediction.

For a numerical illustration of our theoretical findings, a simulation study

has been conducted to compare the empirical performance of SCAD with post-

SCAD OLS under both Gaussian and heavy-tailed settings. A detailed description of the simulation study, including methodology, performance metrics, im-

plementation and results, can be found in Section S.1.4 of Appendix 1.

Supplementary materials are available for download.

[1] Antoniadis, A. and J. Fan (2001). Regularization of wavelet approximations. Journal of the American Statistical Association 96(455), 939–967.

[2] Belloni, A., D. Chen, V. Chernozhukov, and C. Hansen (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80(6), 2369–2429.

[3] Belloni, A. and V. Chernozhukov (2013). Least squares after model selection in high-dimensional sparse models. Bernoulli 19(2), 521 – 547.

[4] Bonaccolto, G. (2021). Quantile–based portfolios: post–model–selection estimation with alternative specifications. Computational Management Science 18(3), 355–383.

[5] Fan, J. and R. Li (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96(456), 1348–1360.

[6] Fan, J. and J. Lv (2011). Nonconcave penalized likelihood with npdimensionality. IEEE Transactions on Information Theory 57(8), 5467–5484.

[7] Fousekis, P. and V. Grigoriadis (2022). Conditional tail price risk spillovers in coffee markets across quality, physical space, and time: Empirical analysis with penalized quantile regressions. Economic Modelling 106, 105691.

[8] Greenshtein, E. and Y. Ritov (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10(6), 971–988.

[9] Horowitz, J. L. and J. Huang (2013). Penalized estimation of high-dimensional models under a generalized sparsity condition. Statistica Sinica 23(2), 725– 748.

[10] Javanmard, A. and J. D. Lee (2020, 05). A flexible framework for hypothesis testing in high dimensions. Journal of the Royal Statistical Society Series B: Statistical Methodology 82(3), 685–718.

[11] Javanmard, A. and A. Montanari (2013). Model selection for high-dimensional regression under the generalized irrepresentability condition. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger (Eds.), Advances in Neural Information Processing Systems, Volume 26. Curran Associates, Inc.

[12] Kim, Y., J.-J. Jeon, and S. Han (2016). A necessary condition for the strong oracle property. Scandinavian Journal of Statistics 43(2), 610–624.

[13] Kim, Y. and S. Kwon (2012). Global optimality of nonconvex penalized estimators. Biometrika 99(2), 315–325.

[14] Leeb, H. and B. M. P¨otscher (2008). Sparse estimators and the oracle property, or the return of hodges’ estimator. Journal of Econometrics 142(1), 201–211.

[15] Liu, H., X. Xu, and J. J. Li (2020). A bootstrap lasso + partial ridge method to construct confidence intervals for parameters in high-dimensional sparse linear models. Statistica Sinica 30(3), 1333–1355.

[16] Lockhart, R., J. Taylor, R. J. Tibshirani, and R. Tibshirani (2014). A significance test for the lasso. The Annals of Statistics 42(2), 413–468.

[17] Loh, P.-L. and M. J. Wainwright (2015). Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima. The Journal of Machine Learning Research 16(1), 559–616.

[18] Loh, P.-L. and M. J. Wainwright (2017). Support recovery without incoherence: A case for nonconvex regularization. The Annals of Statistics 45(6), 2455– 2482.

[19] P¨otscher, B. M. and H. Leeb (2009). On the distribution of penalized maximum likelihood estimators: The lasso, scad, and thresholding. Journal of Multivariate Analysis 100(9), 2065–2082.

[20] Qu, A. and P. Shi (2017). Weak signal identification and inference in penalized model selection. The Annals of Statistics 45(3), 1214–1253.

[21] Ratnasingam, S. and W. Ning (2021). Sequential change point detection for high-dimensional data using nonconvex penalized quantile regression. Biometrical Journal 63(3), 575–598.

[22] Shao, J. and X. Deng (2012). Estimation in high-dimensional linear models with deterministic design matrices. The Annals of Statistics 40(2), 812–831.

[23] Uematsu, Y. and S. Tanaka (2019). High-dimensional macroeconomic forecasting and variable selection via penalized regression. The Econometrics Journal 22(1), 34–56.

[24] Xiao, H. and Y. Sun (2019). On tuning parameter selection in model selection and model averaging: A monte carlo study. Journal of Risk and Financial Management 12(3), 109.

[25] Xiao, H. and Y. Sun (2020). Forecasting the returns of cryptocurrency: A model averaging approach. Journal of Risk and Financial Management 13(11), 278.

[26] Yu, G., L. Yin, S. Lu, and Y. Liu (2020). Confidence intervals for sparse penalized regression with random designs. Journal of the American Statistical Association 115(530), 794–809.

[27] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of statistics 38(2), 894–942.

[28] Zhang, C.-H. and S. S. Zhang (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76(1), 217–242.

[29] Zhang, C.-H. and T. Zhang (2012). A general theory of concave regularization for high-dimensional sparse estimation problems. Statistical Science 27(4), 576–593.

[30] Zhao, T., H. Liu, and T. Zhang (2018). Pathwise coordinate optimization for sparse learning: Algorithm and theory. The Annals of Statistics 46(1), 180– 218.