Abstract
Contents of the Abstract.
Selection methods for high-dimensional models are well developed, but they do not take
into account the choice of the model, which leads to an underestimation of the variability of
the estimator. We propose a procedure for model averaging in high-dimensional regression
models that allows inference even when the number of predictors is larger than the sample
size. The proposed estimator is constructed from the debiased Lasso and the weights are
chosen to reduce the prediction risk. We derive the asymptotic distribution of the estimator
within a high-dimensional framework and offer guarantees for the minimal loss prediction
obtained using our choice of the weights. In contrast to existing approaches, our proposed
method combines the advantages of model averaging with the possibility of inference based
on asymptotic normality. The estimator shows a smaller prediction risk than its competitors when applied to a real, high-dimensional dataset and along various simulation studies,
confirming our theoretical results.
Information
| Preprint No. | SS-2025-0211 |
|---|---|
| Manuscript ID | SS-2025-0211 |
| Complete Authors | Lise Léonard, Eugen Pircalabelu, Rainer von Sachs |
| Corresponding Authors | Eugen Pircalabelu |
| Emails | eugen.pircalabelu@uclouvain.be |
References
- Akaike, H. (1979). A Bayesian Extension of the Minimum AIC Procedure of Autoregressive Model Fitting. Biometrika 66(2), 237–242.
- Ando, T. and K.-C. Li (2014). A Model-Averaging Approach for High-Dimensional Regression. Journal of the American Statistical Association 109(505), 254–265.
- Bogdan, M., E. van den Berg, C. Sabatti, W. Su, and E. J. Candès (2015). SLOPE—Adaptive Variable Selection via Convex Optimization. The Annals of Applied Statistics 9(3), 1103–1140.
- Buckland, S. T., K. P. Burnham, and N. H. Augustin (1997). Model Selection: An Integral Part of Inference. Biometrics 53(2), 603–618.
- Bühlmann, P. and S. van de Geer (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Series in Statistics. Berlin, Heidelberg: Springer.
- Burnham, K. P. and D. R. Anderson (2002). Model Selection and Multimodel Inference. New York, NY: Springer.
- Cade, B. S. (2015). Model averaging and muddled multimodel inferences. Ecology 96(9), 2370–2382.
- Charkhi, A., G. Claeskens, and B. E. Hansen (2016). Minimum Mean Squared Error Model Averaging in Likelihood Models. Statistica Sinica 26(2), 809–840.
- Chen, J. and Z. Chen (2008). Extended Bayesian Information Criteria for Model Selection with Large Model Spaces. Biometrika 95(3), 759–771.
- Chen, Y. and Y. Yang (2021). The One Standard Error Rule for Model Selection: Does It Work? Stats 4(4), 868–892.
- Chen, Z., J. Zhang, W. Xu, and Y. Yang (2022). Consistency of BIC Model Averaging. Statistica Sinica 32, 635–640.
- Claeskens, G. and N. L. Hjort (2008). Model Selection and Model Averaging. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press.
- Dezeure, R., P. Bühlmann, L. Meier, and N. Meinshausen (2015). High-Dimensional Inference: Confidence Intervals, p-Values and R-Software hdi. Statistical Science 30(4), 533–558.
- Fan, J. and R. Li (2001). Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association 96(456), 1348–1360.
- Feng, Y. and Q. Liu (2020). Nested model averaging on solution path for high-dimensional linear regression. Stat 9(1), e317.
- Fletcher, D. (2018). Model Averaging. SpringerBriefs in Statistics. Berlin, Heidelberg: Springer.
- Giraud, C. (2014). Introduction to High-Dimensional Statistics. Boca Raton: Chapman and Hall/CRC.
- Hansen, B. E. (2007). Least Squares Model Averaging. Econometrica 75(4), 1175–1189.
- Hansen, B. E. (2014). Model averaging, asymptotic risk, and regressor groups. Quantitative Economics 5(3), 495–530.
- Hansen, B. E. and J. S. Racine (2012). Jackknife model averaging. Journal of Econometrics 167(1), 38–46.
- Hjort, N. L. and G. Claeskens (2003). Frequentist Model Average Estimators. Journal of the American Statistical Association 98(464), 879–899.
- Hoeting, J. A., D. Madigan, A. E. Raftery, and C. T. Volinsky (1999). Bayesian Model Averaging: A Tutorial. Statistical Science 14(4), 382–401.
- Homrighausen, D. and D. J. McDonald (2017). Risk Consistency of Cross-Validation with Lasso-Type Procedures. Statistica Sinica 27(3), 1017–1036.
- Homrighausen, D. and D. J. McDonald (2018). A study on tuning parameter selection for the highdimensional lasso. Journal of Statistical Computation and Simulation 88(15), 2865–2892.
- Javanmard, A. and A. Montanari (2014). Confidence intervals and hypothesis testing for high-dimensional regression. The Journal of Machine Learning Research 15(1), 2869–2909.
- Javanmard, A. and A. Montanari (2018). Debiasing the lasso: Optimal sample size for Gaussian designs. The Annals of Statistics 46(6A), 2593–2622.
- Lahiri, S. N. (2021). Necessary and sufficient conditions for variable selection consistency of the LASSO in high dimensions. The Annals of Statistics 49(2), 820–844.
- Lan, W., Y. Ma, J. Zhao, H. Wang, and C.-L. Tsai (2018). Sequential Model Averaging for High Dimensional Linear Regression Models. Statistica Sinica 28(1), 449–469.
- Le, T. M. and B. S. Clarke (2022). Model Averaging Is Asymptotically Better Than Model Selection For Prediction. Journal of Machine Learning Research 23(33), 1–53.
- Li, X., T. Zhao, X. Yuan, and H. Liu (2015). The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R. The Journal of Machine Learning Research 16(18), 553–557.
- Liu, C.-A. (2015). Distribution theory of the least squares averaging estimator. Journal of Econometrics 186(1), 142–159.
- Mallows, C. L. (1973). Some Comments on CP. Technometrics 15(4), 661–675.
- Moral-Benito, E. (2015). Model Averaging in Economics: An Overview. Journal of Economic Surveys 29(1), 46–75.
- Park, T. and G. Casella (2008). The Bayesian Lasso. Journal of the American Statistical Association 103(482), 681–686.
- Peng, J. (2024). Model averaging: A shrinkage perspective. Electronic Journal of Statistics 18(2), 3535– 3572.
- Pötscher, B. M. (2006). The Distribution of Model Averaging Estimators and an Impossibility Result regarding Its Estimation. Lecture Notes-Monograph Series 52, 113–129.
- Raftery, A. E. and Y. Zheng (2003). Discussion: Performance of Bayesian Model Averaging. Journal of the American Statistical Association 98(464), 931–938.
- Scheetz, T. E., K.-Y. A. Kim, R. E. Swiderski, A. R. Philp, T. A. Braun, K. L. Knudtson, A. M. Dorrance,
- G. F. DiBona, J. Huang, T. L. Casavant, V. C. Sheffield, and E. M. Stone (2006). Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proceedings of the National Academy of Sciences of the United States of America 103(39), 14429–14434.
- Schomaker, M. (2012). Shrinkage averaging estimation. Statistical Papers 53(4), 1015–1034.
- Schwarz, G. (1978). Estimating the Dimension of a Model. The Annals of Statistics 6(2), 461–464.
- Shao, J. (1997). An Asymptotic Theory for Linear Model Selection. Statistica Sinica 7(2), 221–242.
- Steel, M. F. J. (2020). Model Averaging and Its Use in Economics. Journal of Economic Literature 58(3), 644–719.
- Sun, T. and C.-H. Zhang (2012). Scaled sparse linear regression. Biometrika 99(4), 879–898.
- Sun, T. and C.-H. Zhang (2013). Sparse Matrix Inversion with Scaled Lasso. Journal of Machine Learning Research 14(106), 3385–3418.
- Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58(1), 267–288.
- Tibshirani, R. J. (2013). The lasso problem and uniqueness. Electronic Journal of Statistics 7, 1456–1490.
- van de Geer, S., P. Bühlmann, Y. Ritov, and R. Dezeure (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics 42(3), 1166–1202.
- Vershynin, R. (2018). High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press.
- Wang, H., B. Li, and C. Leng (2009). Shrinkage Tuning Parameter Selection with a Diverging Number of Parameters. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 71(3), 671–683.
- Xie, J., X. Yan, and N. Tang (2021). A Model-averaging method for high-dimensional regression with missing responses at random. Statistica Sinica 31(2), 1005–1026.
- Zhang, C.-H. and S. S. Zhang (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76(1), 217–242.
- Zhang, X. and W. Wang (2019). Optimal Model Averaging Estimation for Partially Linear Models. Statistica Sinica 29(2), 693–718.
- Zhang, X., D. Yu, G. Zou, and H. Liang (2016). Optimal Model Averaging Estimation for Generalized Linear Models and Generalized Linear Mixed-Effects Models. Journal of the American Statistical Association 111(516), 1775–1790.
- Zhang, X., G. Zou, and R. J. Carroll (2015). Model Averaging Based on Kullback-Leibler Distance. Statistica Sinica 25, 1583–1598.
- Zou, H., T. Hastie, and R. Tibshirani (2007). On the “degrees of freedom” of the lasso. The Annals of Statistics 35(5), 2173–2192.
- UCLouvain, Institute of Statistics, Biostatistics and Actuarial Sciences
Acknowledgments
Computational resources have been provided by the supercomputing facilities of the
Université catholique de Louvain (CISM/UCL) and the Consortium des Équipements
de Calcul Intensif en Fédération Wallonie Bruxelles (CÉCI) funded by the Fond de la
Recherche Scientifique de Belgique (F.R.S.-FNRS) under convention 2.5020.11 and
by the Walloon Region.
Supplementary Materials
The Supplementary Material contains the proofs of Theorem 2, Propositions 1, 2
and 3, bias and MSE metrics for the setting presented in Section 5, and additional
simulations. Among these, we show that the MA-d estimator maintains good performance using different estimators for the noise level, and that the influence of the
grid coarseness is negligible when the number of considered models is sufficiently
large. Furthermore, we show that, in some sparse cases, the MA-d estimator can
provide better prediction loss performance relative to the Lasso and better coverage
performance relative to the debiased Lasso. We also constructed MA competitors
based on the Lasso and the debiased Lasso and compared their performance with
that of the proposed method. Additionally, we also explored the performance of the
MA-d estimator when the precision matrix is no longer sparse and when the signal
decreases as the sample size increases.