Abstract

Contents of the Abstract.

Selection methods for high-dimensional models are well developed, but they do not take

into account the choice of the model, which leads to an underestimation of the variability of

the estimator. We propose a procedure for model averaging in high-dimensional regression

models that allows inference even when the number of predictors is larger than the sample

size. The proposed estimator is constructed from the debiased Lasso and the weights are

chosen to reduce the prediction risk. We derive the asymptotic distribution of the estimator

within a high-dimensional framework and offer guarantees for the minimal loss prediction

obtained using our choice of the weights. In contrast to existing approaches, our proposed

method combines the advantages of model averaging with the possibility of inference based

on asymptotic normality. The estimator shows a smaller prediction risk than its competitors when applied to a real, high-dimensional dataset and along various simulation studies,

confirming our theoretical results.

Information

Preprint No.SS-2025-0211
Manuscript IDSS-2025-0211
Complete AuthorsLise Léonard, Eugen Pircalabelu, Rainer von Sachs
Corresponding AuthorsEugen Pircalabelu
Emailseugen.pircalabelu@uclouvain.be

References

  1. Akaike, H. (1979). A Bayesian Extension of the Minimum AIC Procedure of Autoregressive Model Fitting. Biometrika 66(2), 237–242.
  2. Ando, T. and K.-C. Li (2014). A Model-Averaging Approach for High-Dimensional Regression. Journal of the American Statistical Association 109(505), 254–265.
  3. Bogdan, M., E. van den Berg, C. Sabatti, W. Su, and E. J. Candès (2015). SLOPE—Adaptive Variable Selection via Convex Optimization. The Annals of Applied Statistics 9(3), 1103–1140.
  4. Buckland, S. T., K. P. Burnham, and N. H. Augustin (1997). Model Selection: An Integral Part of Inference. Biometrics 53(2), 603–618.
  5. Bühlmann, P. and S. van de Geer (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Series in Statistics. Berlin, Heidelberg: Springer.
  6. Burnham, K. P. and D. R. Anderson (2002). Model Selection and Multimodel Inference. New York, NY: Springer.
  7. Cade, B. S. (2015). Model averaging and muddled multimodel inferences. Ecology 96(9), 2370–2382.
  8. Charkhi, A., G. Claeskens, and B. E. Hansen (2016). Minimum Mean Squared Error Model Averaging in Likelihood Models. Statistica Sinica 26(2), 809–840.
  9. Chen, J. and Z. Chen (2008). Extended Bayesian Information Criteria for Model Selection with Large Model Spaces. Biometrika 95(3), 759–771.
  10. Chen, Y. and Y. Yang (2021). The One Standard Error Rule for Model Selection: Does It Work? Stats 4(4), 868–892.
  11. Chen, Z., J. Zhang, W. Xu, and Y. Yang (2022). Consistency of BIC Model Averaging. Statistica Sinica 32, 635–640.
  12. Claeskens, G. and N. L. Hjort (2008). Model Selection and Model Averaging. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press.
  13. Dezeure, R., P. Bühlmann, L. Meier, and N. Meinshausen (2015). High-Dimensional Inference: Confidence Intervals, p-Values and R-Software hdi. Statistical Science 30(4), 533–558.
  14. Fan, J. and R. Li (2001). Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association 96(456), 1348–1360.
  15. Feng, Y. and Q. Liu (2020). Nested model averaging on solution path for high-dimensional linear regression. Stat 9(1), e317.
  16. Fletcher, D. (2018). Model Averaging. SpringerBriefs in Statistics. Berlin, Heidelberg: Springer.
  17. Giraud, C. (2014). Introduction to High-Dimensional Statistics. Boca Raton: Chapman and Hall/CRC.
  18. Hansen, B. E. (2007). Least Squares Model Averaging. Econometrica 75(4), 1175–1189.
  19. Hansen, B. E. (2014). Model averaging, asymptotic risk, and regressor groups. Quantitative Economics 5(3), 495–530.
  20. Hansen, B. E. and J. S. Racine (2012). Jackknife model averaging. Journal of Econometrics 167(1), 38–46.
  21. Hjort, N. L. and G. Claeskens (2003). Frequentist Model Average Estimators. Journal of the American Statistical Association 98(464), 879–899.
  22. Hoeting, J. A., D. Madigan, A. E. Raftery, and C. T. Volinsky (1999). Bayesian Model Averaging: A Tutorial. Statistical Science 14(4), 382–401.
  23. Homrighausen, D. and D. J. McDonald (2017). Risk Consistency of Cross-Validation with Lasso-Type Procedures. Statistica Sinica 27(3), 1017–1036.
  24. Homrighausen, D. and D. J. McDonald (2018). A study on tuning parameter selection for the highdimensional lasso. Journal of Statistical Computation and Simulation 88(15), 2865–2892.
  25. Javanmard, A. and A. Montanari (2014). Confidence intervals and hypothesis testing for high-dimensional regression. The Journal of Machine Learning Research 15(1), 2869–2909.
  26. Javanmard, A. and A. Montanari (2018). Debiasing the lasso: Optimal sample size for Gaussian designs. The Annals of Statistics 46(6A), 2593–2622.
  27. Lahiri, S. N. (2021). Necessary and sufficient conditions for variable selection consistency of the LASSO in high dimensions. The Annals of Statistics 49(2), 820–844.
  28. Lan, W., Y. Ma, J. Zhao, H. Wang, and C.-L. Tsai (2018). Sequential Model Averaging for High Dimensional Linear Regression Models. Statistica Sinica 28(1), 449–469.
  29. Le, T. M. and B. S. Clarke (2022). Model Averaging Is Asymptotically Better Than Model Selection For Prediction. Journal of Machine Learning Research 23(33), 1–53.
  30. Li, X., T. Zhao, X. Yuan, and H. Liu (2015). The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R. The Journal of Machine Learning Research 16(18), 553–557.
  31. Liu, C.-A. (2015). Distribution theory of the least squares averaging estimator. Journal of Econometrics 186(1), 142–159.
  32. Mallows, C. L. (1973). Some Comments on CP. Technometrics 15(4), 661–675.
  33. Moral-Benito, E. (2015). Model Averaging in Economics: An Overview. Journal of Economic Surveys 29(1), 46–75.
  34. Park, T. and G. Casella (2008). The Bayesian Lasso. Journal of the American Statistical Association 103(482), 681–686.
  35. Peng, J. (2024). Model averaging: A shrinkage perspective. Electronic Journal of Statistics 18(2), 3535– 3572.
  36. Pötscher, B. M. (2006). The Distribution of Model Averaging Estimators and an Impossibility Result regarding Its Estimation. Lecture Notes-Monograph Series 52, 113–129.
  37. Raftery, A. E. and Y. Zheng (2003). Discussion: Performance of Bayesian Model Averaging. Journal of the American Statistical Association 98(464), 931–938.
  38. Scheetz, T. E., K.-Y. A. Kim, R. E. Swiderski, A. R. Philp, T. A. Braun, K. L. Knudtson, A. M. Dorrance,
  39. G. F. DiBona, J. Huang, T. L. Casavant, V. C. Sheffield, and E. M. Stone (2006). Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proceedings of the National Academy of Sciences of the United States of America 103(39), 14429–14434.
  40. Schomaker, M. (2012). Shrinkage averaging estimation. Statistical Papers 53(4), 1015–1034.
  41. Schwarz, G. (1978). Estimating the Dimension of a Model. The Annals of Statistics 6(2), 461–464.
  42. Shao, J. (1997). An Asymptotic Theory for Linear Model Selection. Statistica Sinica 7(2), 221–242.
  43. Steel, M. F. J. (2020). Model Averaging and Its Use in Economics. Journal of Economic Literature 58(3), 644–719.
  44. Sun, T. and C.-H. Zhang (2012). Scaled sparse linear regression. Biometrika 99(4), 879–898.
  45. Sun, T. and C.-H. Zhang (2013). Sparse Matrix Inversion with Scaled Lasso. Journal of Machine Learning Research 14(106), 3385–3418.
  46. Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58(1), 267–288.
  47. Tibshirani, R. J. (2013). The lasso problem and uniqueness. Electronic Journal of Statistics 7, 1456–1490.
  48. van de Geer, S., P. Bühlmann, Y. Ritov, and R. Dezeure (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics 42(3), 1166–1202.
  49. Vershynin, R. (2018). High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press.
  50. Wang, H., B. Li, and C. Leng (2009). Shrinkage Tuning Parameter Selection with a Diverging Number of Parameters. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 71(3), 671–683.
  51. Xie, J., X. Yan, and N. Tang (2021). A Model-averaging method for high-dimensional regression with missing responses at random. Statistica Sinica 31(2), 1005–1026.
  52. Zhang, C.-H. and S. S. Zhang (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76(1), 217–242.
  53. Zhang, X. and W. Wang (2019). Optimal Model Averaging Estimation for Partially Linear Models. Statistica Sinica 29(2), 693–718.
  54. Zhang, X., D. Yu, G. Zou, and H. Liang (2016). Optimal Model Averaging Estimation for Generalized Linear Models and Generalized Linear Mixed-Effects Models. Journal of the American Statistical Association 111(516), 1775–1790.
  55. Zhang, X., G. Zou, and R. J. Carroll (2015). Model Averaging Based on Kullback-Leibler Distance. Statistica Sinica 25, 1583–1598.
  56. Zou, H., T. Hastie, and R. Tibshirani (2007). On the “degrees of freedom” of the lasso. The Annals of Statistics 35(5), 2173–2192.
  57. UCLouvain, Institute of Statistics, Biostatistics and Actuarial Sciences

Acknowledgments

Computational resources have been provided by the supercomputing facilities of the

Université catholique de Louvain (CISM/UCL) and the Consortium des Équipements

de Calcul Intensif en Fédération Wallonie Bruxelles (CÉCI) funded by the Fond de la

Recherche Scientifique de Belgique (F.R.S.-FNRS) under convention 2.5020.11 and

by the Walloon Region.

Supplementary Materials

The Supplementary Material contains the proofs of Theorem 2, Propositions 1, 2

and 3, bias and MSE metrics for the setting presented in Section 5, and additional

simulations. Among these, we show that the MA-d estimator maintains good performance using different estimators for the noise level, and that the influence of the

grid coarseness is negligible when the number of considered models is sufficiently

large. Furthermore, we show that, in some sparse cases, the MA-d estimator can

provide better prediction loss performance relative to the Lasso and better coverage

performance relative to the debiased Lasso. We also constructed MA competitors

based on the Lasso and the debiased Lasso and compared their performance with

that of the proposed method. Additionally, we also explored the performance of the

MA-d estimator when the precision matrix is no longer sparse and when the signal

decreases as the sample size increases.


Supplementary materials are available for download.