Distributed Focused Information Criterion and Focused Frequentist Model Averaging for Massive Data

Yifan Zhang, Xiaolin Chen and Yuzhan Xing

doi:10.5705/ss.202025.0060

Abstract

This article investigates the focused information criterion and focused

frequentist model averaging estimators for linear regression model with massive distributed data under the local asymptotic framework. Three divide-and-

conquer-type approaches—one one-shot and two iterative methods—are employed

to estimate the regression coefficient vector of each candidate model. We establish corresponding estimators’ limiting distributions and the upper bounds for

their mean square errors. These distributed estimators are subsequently utilized

to develop the distributed focused information criterion and distributed focused

frequentist model averaging estimators for the focus parameter. We also rigorously derive the asymptotic distributions for the distributed estimators of the

targeted parameter under each candidate model and the distributed focused frequentist model averaging estimators with both fixed and data-driven weights,

along with the upper bounds of mean square errors of the model averaging estimators. Extensive simulation studies are conducted to validate the established

methodologies and the corresponding theories, and a real-world dataset is analyzed to demonstrate their practical application.

Key words and phrases: Distributed focused information criterion; Distributed focused frequentist model averaging; Local asymptotic framework

Information

Preprint No.	SS-2025-0060
Manuscript ID	SS-2025-0060
Complete Authors	Yifan Zhang, Xiaolin Chen, Yuzhan Xing
Corresponding Authors	Xiaolin Chen
Emails	xlchen@amss.ac.cn

References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In 2nd International Symposium on Information Theory, pages 267–281. Akad´emiai Kiad´o Location Budapest, Hungary.
Ando, T. and Li, K.-C. (2014). A model-averaging approach for high-dimensional regression. Journal of the American Statistical Association, 109(505):254–265.
Bayle, P., Fan, J., and Lou, Z. (2025). Communication-efficient distributed estimation and inference for Cox’s model. Journal of the American Statistical Association, (just-accepted):1–20.
Buckland, S. T., Burnham, K. P., and Augustin, N. H. (1997). Model selection: An integral part of inference. Biometrics, 53(2):603–618.
Claeskens, G. and Hjort, N. L. (2003). The focused information criterion. Journal of the American Statistical Association, 98(464):900–916.
Fan, J., Guo, Y., and Wang, K. (2023). Communication-efficient accurate statistical estimation. Journal of the American Statistical Association, 118(542):1000–1010.
Fang, F., Yin, X., and Zhang, Q. (2018). Divide and conquer algorithms for model averaging with massive data. Journal of Systems Science and Mathematical Sciences, 38(7):764.
Gao, Y., Liu, W., Wang, H., Wang, X., Yan, Y., and Zhang, R. (2022). A review of distributed statistical inference. Statistical Theory and Related Fields, 6(2):89–99.
Hansen, B. E. (2007). Least squares model averaging. Econometrica, 75(4):1175–1189.
Hansen, B. E. and Racine, J. S. (2012). Jackknife model averaging. Journal of Econometrics, 167(1):38–46.
Hjort, N. L. and Claeskens, G. (2003). Frequentist model average estimators. Journal of the American Statistical Association, 98(464):879–899.
Jordan, M. I., Lee, J. D., and Yang, Y. (2019). Communication-efficient distributed statistical inference. Journal of the American Statistical Association, 114(526):668–681.
Liang, H., Zou, G., Wan, A. T., and Zhang, X. (2011). Optimal weight choice for frequentist model average estimators. Journal of the American Statistical Association, 106(495):1053– 1066.
Liu, C.-A. (2015). Distribution theory of the least squares averaging estimator. Journal of Econometrics, 186(1):142–159.
Mallows, C. (1973). Some comments on Cp. Technometrics, 15(4):661–675.
Peng, J., Li, Y., and Yang, Y. (2025). On optimality of Mallows model averaging. Journal of the American Statistical Association, 120(550):1152–1163.
Peng, J. and Yang, Y. (2022). On improvability of model selection by model averaging. Journal of Econometrics, 229(2):246–262.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2):461– 464.
Shamir, O., Srebro, N., and Zhang, T. (2014). Communication-efficient distributed optimization using an approximate Newton-type method. In Proceedings of the 31st International Conference on International Conference on Machine Learning, volume 32, pages 1000–1008.
Su, W., Yin, G., Zhang, J., and Zhao, X. (2022). Divide and conquer for accelerated failure time model with massive time-to-event data. Canadian Journal of Statistics, 51(2):400–419.
Wan, A. T., Zhang, X., and Zou, G. (2010). Least squares model averaging by Mallows criterion. Journal of Econometrics, 156(2):277–283.
Wang, Q., Du, J., and Sheng, Y. (2025). Distributed empirical likelihood inference with or without byzantine failures. Statistics and Computing, 35(5):1–20.
Xia, X., He, S., and Pang, N. (2025). Communication-efficient model averaging prediction for massive data with asymptotic optimality. Statistical Papers, 66(2):1–45.
Yang, Y. (2001). Adaptive regression by mixing. Journal of the American Statistical Association, 96(454):574–588.
Yu, D., Lian, H., Sun, Y., Zhang, X., and Hong, Y. (2024). Post-averaging inference for optimal model averaging estimator in generalized linear models. Econometric Reviews, 43(2-4):98– 122.
Zhang, H., Liu, Z., and Zou, G. (2023). Least squares model averaging for distributed data. Journal of Machine Learning Research, 24(215):1–59.
Zhang, X. and Liang, H. (2011). Focused information criterion and model averaging for generalized additive partial linear models. The Annals of Statistics, 39(1):174–200.
Zhang, X. and Liu, C.-A. (2019). Inference after model averaging in linear regression models. Econometric Theory, 35(4):816–841.
Zhang, X. and Liu, C.-A. (2023). Model averaging prediction by K-fold cross-validation. Journal of Econometrics, 235(1):280–301.
Zhang, X., Zou, G., Liang, H., and Carroll, R. J. (2020). Parsimonious model averaging with a diverging number of parameters. Journal of the American Statistical Association, 115(530):972–984.
Zhang, Y., Duchi, J. C., and Wainwright, M. J. (2013). Communication-efficient algorithms for statistical optimization. Journal of Machine Learning Research, 14(68):3321–3363.
Zhou, L., She, X., and Song, P. X.-K. (2023). Distributed empirical likelihood approach to integrating unbalanced datasets. Statistica Sinica, 33(3):2209–2231. Yifan Zhang

Acknowledgments

Xiaolin Chen’s research is supported by the National Social Science Fund

of China (25BTJ038) .

Supplementary Materials

The online Supplementary Material contains discussions about our technical assumptions, proofs of the theorems, additional numerical simulation

studies, and implementation details and further analysis of the real-world

example.

Supplementary materials are available for download.

[1] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In 2nd International Symposium on Information Theory, pages 267–281. Akad´emiai Kiad´o Location Budapest, Hungary.

[2] Ando, T. and Li, K.-C. (2014). A model-averaging approach for high-dimensional regression. Journal of the American Statistical Association, 109(505):254–265.

[3] Bayle, P., Fan, J., and Lou, Z. (2025). Communication-efficient distributed estimation and inference for Cox’s model. Journal of the American Statistical Association, (just-accepted):1–20.

[4] Buckland, S. T., Burnham, K. P., and Augustin, N. H. (1997). Model selection: An integral part of inference. Biometrics, 53(2):603–618.

[5] Claeskens, G. and Hjort, N. L. (2003). The focused information criterion. Journal of the American Statistical Association, 98(464):900–916.

[6] Fan, J., Guo, Y., and Wang, K. (2023). Communication-efficient accurate statistical estimation. Journal of the American Statistical Association, 118(542):1000–1010.

[7] Fang, F., Yin, X., and Zhang, Q. (2018). Divide and conquer algorithms for model averaging with massive data. Journal of Systems Science and Mathematical Sciences, 38(7):764.

[8] Gao, Y., Liu, W., Wang, H., Wang, X., Yan, Y., and Zhang, R. (2022). A review of distributed statistical inference. Statistical Theory and Related Fields, 6(2):89–99.

[9] Hansen, B. E. (2007). Least squares model averaging. Econometrica, 75(4):1175–1189.

[10] Hansen, B. E. and Racine, J. S. (2012). Jackknife model averaging. Journal of Econometrics, 167(1):38–46.

[11] Hjort, N. L. and Claeskens, G. (2003). Frequentist model average estimators. Journal of the American Statistical Association, 98(464):879–899.

[12] Jordan, M. I., Lee, J. D., and Yang, Y. (2019). Communication-efficient distributed statistical inference. Journal of the American Statistical Association, 114(526):668–681.

[13] Liang, H., Zou, G., Wan, A. T., and Zhang, X. (2011). Optimal weight choice for frequentist model average estimators. Journal of the American Statistical Association, 106(495):1053– 1066.

[14] Liu, C.-A. (2015). Distribution theory of the least squares averaging estimator. Journal of Econometrics, 186(1):142–159.

[15] Mallows, C. (1973). Some comments on Cp. Technometrics, 15(4):661–675.

[16] Peng, J., Li, Y., and Yang, Y. (2025). On optimality of Mallows model averaging. Journal of the American Statistical Association, 120(550):1152–1163.

[17] Peng, J. and Yang, Y. (2022). On improvability of model selection by model averaging. Journal of Econometrics, 229(2):246–262.

[18] Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2):461– 464.

[19] Shamir, O., Srebro, N., and Zhang, T. (2014). Communication-efficient distributed optimization using an approximate Newton-type method. In Proceedings of the 31st International Conference on International Conference on Machine Learning, volume 32, pages 1000–1008.

[20] Su, W., Yin, G., Zhang, J., and Zhao, X. (2022). Divide and conquer for accelerated failure time model with massive time-to-event data. Canadian Journal of Statistics, 51(2):400–419.

[21] Wan, A. T., Zhang, X., and Zou, G. (2010). Least squares model averaging by Mallows criterion. Journal of Econometrics, 156(2):277–283.

[22] Wang, Q., Du, J., and Sheng, Y. (2025). Distributed empirical likelihood inference with or without byzantine failures. Statistics and Computing, 35(5):1–20.

[23] Xia, X., He, S., and Pang, N. (2025). Communication-efficient model averaging prediction for massive data with asymptotic optimality. Statistical Papers, 66(2):1–45.

[24] Yang, Y. (2001). Adaptive regression by mixing. Journal of the American Statistical Association, 96(454):574–588.

[25] Yu, D., Lian, H., Sun, Y., Zhang, X., and Hong, Y. (2024). Post-averaging inference for optimal model averaging estimator in generalized linear models. Econometric Reviews, 43(2-4):98– 122.

[26] Zhang, H., Liu, Z., and Zou, G. (2023). Least squares model averaging for distributed data. Journal of Machine Learning Research, 24(215):1–59.

[27] Zhang, X. and Liang, H. (2011). Focused information criterion and model averaging for generalized additive partial linear models. The Annals of Statistics, 39(1):174–200.

[28] Zhang, X. and Liu, C.-A. (2019). Inference after model averaging in linear regression models. Econometric Theory, 35(4):816–841.

[29] Zhang, X. and Liu, C.-A. (2023). Model averaging prediction by K-fold cross-validation. Journal of Econometrics, 235(1):280–301.

[30] Zhang, X., Zou, G., Liang, H., and Carroll, R. J. (2020). Parsimonious model averaging with a diverging number of parameters. Journal of the American Statistical Association, 115(530):972–984.

[31] Zhang, Y., Duchi, J. C., and Wainwright, M. J. (2013). Communication-efficient algorithms for statistical optimization. Journal of Machine Learning Research, 14(68):3321–3363.

[32] Zhou, L., She, X., and Song, P. X.-K. (2023). Distributed empirical likelihood approach to integrating unbalanced datasets. Statistica Sinica, 33(3):2209–2231. Yifan Zhang