Robust Jackknife Model Averaging

Kang You, Miaomiao Wang and Guohua Zou

doi:10.5705/ss.202025.0057

Abstract

In the age of big data, model averaging has been proved to be a powerful tool for

data analysis, which helps to mitigate bias and reduce overfitting that can arise from relying on a single model. However, outliers in large-scale datasets like image recognition and

fraud detection can severely degrade traditional model averaging built on least squares or

maximum likelihood. To address this challenge, we propose a robust jackknife model averaging (RJMA) approach, where the weights are selected by minimizing a leave-one-out

cross-validation criterion. This framework is adaptable to situations where the dimensions

of candidate models increase with the sample size. We establish the asymptotic optimality

of the RJMA estimator, demonstrating its ability to minimize out-of-sample final prediction errors. We also present the consistency of the proposed weight estimator to the the-

oretically optimal weight vector. Furthermore, in the scenario where one or more correct

models are present in the candidate model set, we show that RJMA assigns all weights

to the correct models, leading to a consistent model averaging estimator. Additionally, we

derive the influence function of the RJMA estimator and introduce the empirical prediction

influence function to quantitatively evaluate its robustness. To illustrate the efficacy of the

proposed methodology, we conduct numerical studies including Monte Carlo simulations

and a real data analysis, which confirm the practical applicability and robustness of the

RJMA approach.

Key words and phrases: Cross-validation, generalised M estimators, influence function, model averaging, robustness

Information

Preprint No.	SS-2025-0057
Manuscript ID	SS-2025-0057
Complete Authors	Kang You, Miaomiao Wang, Guohua Zou
Corresponding Authors	Guohua Zou
Emails	ghzou@amss.ac.cn

References

Ando, T. and K.-C. Li (2014). A model-averaging approach for high-dimensional regression. Journal of the American Statistical Association 109, 254–265.
Avella-Medina, M. and E. Ronchetti (2018). Robust and consistent variable selection in high-dimensional generalized linear models. Biometrika 105, 31–44.
Bai, Z. D., C. R. Rao, and Y. Wu (1992). M-estimation of multivariate linear regression parameters under a convex discrepancy function. Statistica Sinica 2, 237–254.
Buckland, S. T., K. P. Burnham, and N. H. Augustin (1997). Model selection: An integral part of inference. Biometrics 53, 603–618. Robust jackknife model averaging
Burman, P. and D. Nolan (1995). A general Akaike-type criterion for model selection in robust regression. Biometrika 82, 877–886.
Chen, J., D. G. Li, O. Linton, and Z. D. Lu (2018). Semiparametric ultra-high dimensional model averaging of nonlinear dynamic time series. Journal of the American Statistical Association 113, 919–932.
Coakley, C. W. and T. P. Hettamansperger (1993). A bounded influence, high breakdown, efficient regression estimator. Journal of the American Statistical Association 88, 872–880.
Du, J., Z. Z. Zhang, and T. F. Xie (2018). Model averaging for M-estimation. Statistics 52, 1417–1432.
Fan, J. Q., Y. Y. Fan, and E. Barut (2014). Adaptive robust variable selection. Annals of Statistics 42, 324–351.
Fan, J. Q. and H. Peng (2004). On nonconcave penalized likelihood with diverging number of parameters. Annals of Statistics 32, 928–961.
Fang, F., J. L. Li, and X. C. Xia (2022). Semiparametric model averaging prediction for dichotomous response. Journal of Econometrics 229, 219–245.
Gao, Y., X. Y. Zhang, S. Y. Wang, and G. H. Zou (2016). Model averaging based on leave-subject-out cross-validation. Journal of Econometrics 192, 139–151.
Guo, Y. F. and Z. H. Li (2021). Outlier robust model averaging based on Sp criterion. Stat 10, 1–10.
Hammer, S. M., D. A. Katzenstein, M. D. Hughes, H. Gundaker, R. T. Schooley, R. H. Haubrich, et al.
(1996). A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine 335, Robust jackknife model averaging 1081–1090.
Hampel, F. R. (1968). “Contributions to the theory of robust estimation”, Ph. D. thesis. University of California Berkeley. URL: https://books.google.co.uk/books?id=nh MvwEACAAJ.
Han, P., L. Kong, J. Zhao, and X. Zhou (2019). A general framework for quantile estimation with incomplete data. Journal of the Royal Statistical Society B 81, 305–333.
Hansen, B. E. (2007). Least squares model averaging. Econometrica 75, 1175–1189.
Hansen, B. E. and J. S. Racine (2012). Jackknife model averaging. Journal of Econometrics 167, 38–46.
He, B. H., S. G. Ma, X. Y. Zhang, and L. X. Zhu (2023). Rank-based greedy model averaging for highdimensional survival data. Journal of the American Statistical Association 118, 2658–2670.
He, X. M., D. G. Simpson, and G. Y. Wang (2000). Breakdown points of t-type regression estimators. Biometrika 87, 675–687.
Hjort, N. L. and G. Claeskens (2003). Frequentist model average estimators. Journal of the American Statistical Association 98, 879–899.
Hoeting, J. A., D. Madigan, A. E. Raftery, and C. T. Volinsky (1999). Bayesian model averaging: A tutorial. Statistical Science 14, 382–401.
Hu, X. N. and X. Y. Zhang (2023). Optimal parameter-transfer learning by semiparametric model averaging. Journal of Machine Learning Research 24, 1–53.
Kitagawa, T. and M. Chris (2016). Model averaging in semiparametric estimation of treatment effects. Journal of Econometrics 193, 271–289. Robust jackknife model averaging
Knight, K. (1998). Limiting distributions for L1 regression estimators under general conditions. Annals of Statistics 26, 755–770.
Le, T. M. and B. S. Clarke (2022). Model averaging is asymptotically better than model selection for prediction. Journal of Machine Learning Research 23, 1–53.
Li, D. G., O. Linton, and Z. D. Lu (2015). A flexible semiparametric forecasting model for time series. Journal of Econometrics 187, 345–357.
Li, G. R., H. Peng, and L. Zhu (2011). Nonconcave penalized M-estimation with a diverging number of parameters. Statistica Sinica 21, 391–419.
Li, J. L., J. Lv, A. T. K. Wan, and J. Liao (2022). Adaboost semiparametric model averaging prediction for multiple categories. Journal of the American Statistical Association 117, 495–509.
Liang, H., G. H. Zou, A. T. K. Wan, and X. Y. Zhang (2011). Optimal weight choice for frequentist model average estimators. Journal of the American Statistical Association 106, 1053–1066.
Liao, J., G. H. Zou, Y. Gao, and X. Y. Zhang (2021). Model averaging prediction for time series models with a diverging number of parameters. Journal of Econometrics 223, 190–221.
Liu, Q. F. and R. Okui (2013). Heteroscedasticity-robust Cp model averaging. Econometrics Journal 16, 463–472.
Lozano, A., N. Meinshausen, and E. Yang (2016). Minimum distance lasso for robust high-dimensional regression. Electronic Journal of Statistics 10, 1296–1340.
Lu, X. and L. J. Su (2015). Jackknife model averaging for quantile regressions. Journal of EconometricRobust jackknife model averaging s 188, 40–58.
Rao, C. R. and L. C. Zhao (1992). Approximation to the distribution of M-estimates in linear models by randomly weighted bootstrap. Sankhy¯a A 54, 323–331.
Ronchetti, E. (1997). Robustness aspects of model choice. Statistica Sinica 7, 327–338.
Ronchetti, E., C. Field, and W. Blanchard (1997). Robust linear model selection by cross-validation. Journal of the American Statistical Association 92, 1017–1023.
Ronchetti, E. and R. G. Staudte (1994). A robust version of Mallows’ Cp. Journal of the American Statistical Association 89, 550–559.
Schomaker, M. and C. Heumann (2020). When and when not to use optimal model averaging. Statistical Papers 61, 2221–2240.
Sommer, S. and R. G. Staudte (1995). Robust variable selection in regression in the presence of outliers and leverage points. Australian Journal of Statistics 37, 323–336.
Wan, A. T. K., X. Y. Zhang, and S. Wang (2014). Frequentist model averaging for multinomial and ordered logit models. International Journal of Forecasting 30, 118–128.
Wan, A. T. K., X. Y. Zhang, and G. H. Zou (2010). Least squares model averaging by Mallows criterion. Journal of Econometrics 156, 277–283.
Wang, M. M., K. You, L. X. Zhu, and G. H. Zou (2024). Robust model averaging approach by Mallowstype criterion. Biometrics 80, ujae128.
Wang, M. M., X. Y. Zhang, A. T. K. Wan, K. You, and G. H. Zou (2023). Jackknife model averaging for Robust jackknife model averaging high-dimensional quantile regression. Biometrics 79, 178–189.
White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica 50, 1–25.
Wisnowski, J. W., J. R. Simpson, D. C. Montgomery, and G. C. Runger (2003). Resampling methods for variable selection in robust regression. Computational Statistics and Data Analysis 43, 341–355.
Wu, W. B. (2007). M-estimation of linear models with dependent errors. Annals of Statistics 35, 495–521.
Yang, Y. H. (2001). Adaptive regression by mixing. Journal of the American Statistical Association 96, 574–588.
Zhang, X. Y., D. L. Yu, G. H. Zou, and H. Liang (2016). Optimal model averaging estimation for generalized linear models and generalized linear mixed-effects models. Journal of the American Statistical Association 111, 1775–1790.
Zhang, X. Y., G. H. Zou, and R. Carroll (2015). Model averaging based on Kullback-Leibler distance. Statistica Sinica 25, 1583–1598.
Zhu, R., A. T. K. Wan, X. Y. Zhang, and G. H. Zou (2019). A Mallows-type model averaging estimator for the varying-coefficient partially linear model. Journal of the American Statistical Association 114, 882–892.

Acknowledgments

The authors thank the editor, the associate editor, and two referees for their

careful reviews and helpful suggestions. Zou and Wang’s work was supported

by the National Natural Science Foundation of China (Grant Nos. 12531012,

12031016, 12426308 and 12401335). Zou’s work was also supported by the

Beijing Outstanding Young Scientist Program (Grant No. JWZQ20240101027).

You’s work was partially supported by the Engineering and Physical Sciences

Research Council of United Kingdom (Grant No. EP/X038297/1).

Supplementary Materials

The Supplementary Material contains the robustness property of the RJMA

Robust jackknife model averaging

estimator, proofs of theorems and additional simulation studies.

Supplementary materials are available for download.

[1] Ando, T. and K.-C. Li (2014). A model-averaging approach for high-dimensional regression. Journal of the American Statistical Association 109, 254–265.

[2] Avella-Medina, M. and E. Ronchetti (2018). Robust and consistent variable selection in high-dimensional generalized linear models. Biometrika 105, 31–44.

[3] Bai, Z. D., C. R. Rao, and Y. Wu (1992). M-estimation of multivariate linear regression parameters under a convex discrepancy function. Statistica Sinica 2, 237–254.

[4] Buckland, S. T., K. P. Burnham, and N. H. Augustin (1997). Model selection: An integral part of inference. Biometrics 53, 603–618. Robust jackknife model averaging

[5] Burman, P. and D. Nolan (1995). A general Akaike-type criterion for model selection in robust regression. Biometrika 82, 877–886.

[6] Chen, J., D. G. Li, O. Linton, and Z. D. Lu (2018). Semiparametric ultra-high dimensional model averaging of nonlinear dynamic time series. Journal of the American Statistical Association 113, 919–932.

[7] Coakley, C. W. and T. P. Hettamansperger (1993). A bounded influence, high breakdown, efficient regression estimator. Journal of the American Statistical Association 88, 872–880.

[8] Du, J., Z. Z. Zhang, and T. F. Xie (2018). Model averaging for M-estimation. Statistics 52, 1417–1432.

[9] Fan, J. Q., Y. Y. Fan, and E. Barut (2014). Adaptive robust variable selection. Annals of Statistics 42, 324–351.

[10] Fan, J. Q. and H. Peng (2004). On nonconcave penalized likelihood with diverging number of parameters. Annals of Statistics 32, 928–961.

[11] Fang, F., J. L. Li, and X. C. Xia (2022). Semiparametric model averaging prediction for dichotomous response. Journal of Econometrics 229, 219–245.

[12] Gao, Y., X. Y. Zhang, S. Y. Wang, and G. H. Zou (2016). Model averaging based on leave-subject-out cross-validation. Journal of Econometrics 192, 139–151.

[13] Guo, Y. F. and Z. H. Li (2021). Outlier robust model averaging based on Sp criterion. Stat 10, 1–10.

[14] Hammer, S. M., D. A. Katzenstein, M. D. Hughes, H. Gundaker, R. T. Schooley, R. H. Haubrich, et al.

[15] (1996). A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine 335, Robust jackknife model averaging 1081–1090.

[16] Hampel, F. R. (1968). “Contributions to the theory of robust estimation”, Ph. D. thesis. University of California Berkeley. URL: https://books.google.co.uk/books?id=nh MvwEACAAJ.

[17] Han, P., L. Kong, J. Zhao, and X. Zhou (2019). A general framework for quantile estimation with incomplete data. Journal of the Royal Statistical Society B 81, 305–333.

[18] Hansen, B. E. (2007). Least squares model averaging. Econometrica 75, 1175–1189.

[19] Hansen, B. E. and J. S. Racine (2012). Jackknife model averaging. Journal of Econometrics 167, 38–46.

[20] He, B. H., S. G. Ma, X. Y. Zhang, and L. X. Zhu (2023). Rank-based greedy model averaging for highdimensional survival data. Journal of the American Statistical Association 118, 2658–2670.

[21] He, X. M., D. G. Simpson, and G. Y. Wang (2000). Breakdown points of t-type regression estimators. Biometrika 87, 675–687.

[22] Hjort, N. L. and G. Claeskens (2003). Frequentist model average estimators. Journal of the American Statistical Association 98, 879–899.

[23] Hoeting, J. A., D. Madigan, A. E. Raftery, and C. T. Volinsky (1999). Bayesian model averaging: A tutorial. Statistical Science 14, 382–401.

[24] Hu, X. N. and X. Y. Zhang (2023). Optimal parameter-transfer learning by semiparametric model averaging. Journal of Machine Learning Research 24, 1–53.

[25] Kitagawa, T. and M. Chris (2016). Model averaging in semiparametric estimation of treatment effects. Journal of Econometrics 193, 271–289. Robust jackknife model averaging

[26] Knight, K. (1998). Limiting distributions for L1 regression estimators under general conditions. Annals of Statistics 26, 755–770.

[27] Le, T. M. and B. S. Clarke (2022). Model averaging is asymptotically better than model selection for prediction. Journal of Machine Learning Research 23, 1–53.

[28] Li, D. G., O. Linton, and Z. D. Lu (2015). A flexible semiparametric forecasting model for time series. Journal of Econometrics 187, 345–357.

[29] Li, G. R., H. Peng, and L. Zhu (2011). Nonconcave penalized M-estimation with a diverging number of parameters. Statistica Sinica 21, 391–419.

[30] Li, J. L., J. Lv, A. T. K. Wan, and J. Liao (2022). Adaboost semiparametric model averaging prediction for multiple categories. Journal of the American Statistical Association 117, 495–509.

[31] Liang, H., G. H. Zou, A. T. K. Wan, and X. Y. Zhang (2011). Optimal weight choice for frequentist model average estimators. Journal of the American Statistical Association 106, 1053–1066.

[32] Liao, J., G. H. Zou, Y. Gao, and X. Y. Zhang (2021). Model averaging prediction for time series models with a diverging number of parameters. Journal of Econometrics 223, 190–221.

[33] Liu, Q. F. and R. Okui (2013). Heteroscedasticity-robust Cp model averaging. Econometrics Journal 16, 463–472.

[34] Lozano, A., N. Meinshausen, and E. Yang (2016). Minimum distance lasso for robust high-dimensional regression. Electronic Journal of Statistics 10, 1296–1340.

[35] Lu, X. and L. J. Su (2015). Jackknife model averaging for quantile regressions. Journal of EconometricRobust jackknife model averaging s 188, 40–58.

[36] Rao, C. R. and L. C. Zhao (1992). Approximation to the distribution of M-estimates in linear models by randomly weighted bootstrap. Sankhy¯a A 54, 323–331.

[37] Ronchetti, E. (1997). Robustness aspects of model choice. Statistica Sinica 7, 327–338.

[38] Ronchetti, E., C. Field, and W. Blanchard (1997). Robust linear model selection by cross-validation. Journal of the American Statistical Association 92, 1017–1023.

[39] Ronchetti, E. and R. G. Staudte (1994). A robust version of Mallows’ Cp. Journal of the American Statistical Association 89, 550–559.

[40] Schomaker, M. and C. Heumann (2020). When and when not to use optimal model averaging. Statistical Papers 61, 2221–2240.

[41] Sommer, S. and R. G. Staudte (1995). Robust variable selection in regression in the presence of outliers and leverage points. Australian Journal of Statistics 37, 323–336.

[42] Wan, A. T. K., X. Y. Zhang, and S. Wang (2014). Frequentist model averaging for multinomial and ordered logit models. International Journal of Forecasting 30, 118–128.

[43] Wan, A. T. K., X. Y. Zhang, and G. H. Zou (2010). Least squares model averaging by Mallows criterion. Journal of Econometrics 156, 277–283.

[44] Wang, M. M., K. You, L. X. Zhu, and G. H. Zou (2024). Robust model averaging approach by Mallowstype criterion. Biometrics 80, ujae128.

[45] Wang, M. M., X. Y. Zhang, A. T. K. Wan, K. You, and G. H. Zou (2023). Jackknife model averaging for Robust jackknife model averaging high-dimensional quantile regression. Biometrics 79, 178–189.

[46] White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica 50, 1–25.

[47] Wisnowski, J. W., J. R. Simpson, D. C. Montgomery, and G. C. Runger (2003). Resampling methods for variable selection in robust regression. Computational Statistics and Data Analysis 43, 341–355.

[48] Wu, W. B. (2007). M-estimation of linear models with dependent errors. Annals of Statistics 35, 495–521.

[49] Yang, Y. H. (2001). Adaptive regression by mixing. Journal of the American Statistical Association 96, 574–588.

[50] Zhang, X. Y., D. L. Yu, G. H. Zou, and H. Liang (2016). Optimal model averaging estimation for generalized linear models and generalized linear mixed-effects models. Journal of the American Statistical Association 111, 1775–1790.

[51] Zhang, X. Y., G. H. Zou, and R. Carroll (2015). Model averaging based on Kullback-Leibler distance. Statistica Sinica 25, 1583–1598.

[52] Zhu, R., A. T. K. Wan, X. Y. Zhang, and G. H. Zou (2019). A Mallows-type model averaging estimator for the varying-coefficient partially linear model. Journal of the American Statistical Association 114, 882–892.