Abstract
In complex systems, networks describe relationships between nodes through edges. Latent
space models are widely used for network tasks such as community detection and link prediction due
to their interpretability and visualization power. However, when the network size is small or the true
latent dimension is large, a single latent space model may suffer from high estimation error or model
misspecification. To address this, we propose Network Model Averaging (NetMA), which combines
multiple latent space models with different dimensions. The weights are estimated using a K-fold edge
cross-validation scheme that is specially designed for network data. Our method applies to both singlelayer and multi-layer networks. We provide theoretical guarantees for NetMA. When all candidate
models are misspecified, NetMA still achieves asymptotically optimal prediction. When models with
large enough latent dimensions are included, NetMA assigns nearly all weights to them. We also prove
that the estimated weights converge to the optimal weights. Simulation studies show that NetMA
performs better than model selection and simple averaging. It even outperforms the “oracle” model
when the true latent dimension is large. Applications to mutual-following and virtual event networks
further highlight the strong performance of NetMA in link prediction.
Key words and phrases: Asymptotic optimality, Consistency, Edge cross-validation, Network model ∗Yan Zhang and Jun Liao are co-first authors and contributed equally to this work
Information
| Preprint No. | SS-2025-0266 |
|---|---|
| Manuscript ID | SS-2025-0266 |
| Complete Authors | Yan Zhang, Jun Liao, Xinyan Fan, Kuangnan Fang, Yuhong Yang |
| Corresponding Authors | Kuangnan Fang |
| Emails | xmufkn@xmu.edu.cn |
References
- Ando, T. and K.-C. Li (2017). A weight-relaxed model averaging approach for high-dimensional generalized linear models. The Annals of Statistics 45(6), 2654 – 2679.
- Chatterjee, S. (2012). Matrix estimation by universal singular value thresholding. The Annals of Statistics 43, 177–214.
- Dong, X., F. An, Z. Dong, Z. Wang, M. Jiang, P. Yang, and H. An (2021). Optimization of the international nickel ore trade network. Resources Policy 70, 101978.
- Durante, D. and D. B. Dunson (2014). Nonparametric bayes dynamic modelling of relational data. Biometrika 101(4), 883–898.
- Erdös, P. and A. Rényi (1959). On random graphs i. Publicationes Mathematicae Debrecen 6, 290–297.
- Fragoso, T. M., W. Bertoli, and F. Louzada (2018). Bayesian model averaging: A systematic review and conceptual classification. International Statistical Review 86(1), 1–28.
- Friel, N., R. Rastelli, J. Wyse, and A. E. Raftery (2016). Interlocking directorates in Irish companies using a latent space model for bipartite networks. Proceedings of the National Academy of Sciences 113(24), 6629–6634.
- Gao, C. and Z. Ma (2020). Discussion of ‘network cross-validation by edge sampling’. Biometrika 107(2), 281–284.
- Gao, Y., X. Zhang, S. Wang, and G. Zou (2016). Model averaging based on leave-subject-out cross-validation. Journal of Econometrics 192(1), 139–151.
- Gwee, X. Y., I. C. Gormley, and M. Fop (2025). A latent shrinkage position model for binary and count network data. Bayesian Analysis 20(2), 405 – 433.
- Handcock, M. S., A. E. Raftery, and J. M. Tantrum (2007). Model-based clustering for social networks. Journal of the Royal Statistical Society: Series A (Statistics in Society) 170(2), 301–354.
- Hansen, B. E. (2007). Least squares model averaging. Econometrica 75(4), 1175–1189.
- Hansen, B. E. and J. S. Racine (2012). Jackknife model averaging. Journal of Econometrics 167(1), 38–46.
- Hoeting, J. A., D. Madigan, A. E. Raftery, and C. T. Volinsky (1999). Bayesian model averaging: A tutorial. Statistical Science 14(4), 382–417.
- Hoff, P. D. (2007). Modeling homophily and stochastic equivalence in symmetric relational data. In Proceedings of the 20th International Conference on Neural Information Processing Systems, pp. 657–664.
- Hoff, P. D., A. E. Raftery, and M. S. Handcock (2002). Latent space approaches to social network analysis. Journal of the American Statistical Association 97(460), 1090–1098.
- Holland, P. W., K. B. Laskey, and S. Leinhardt (1983). Stochastic blockmodels: First steps. Social Networks 5(2), 109–137.
- Holland, P. W. and S. Leinhardt (1981). An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association 76(373), 33–50.
- Jankowski, J., R. Michalski, and P. Bródka (2017). A multilayer network dataset of interaction and influence spreading in a virtual world. Scientific Data 4(1), 1–9.
- Jo, W., D. Chang, M. You, and G.-H. Ghim (2021). A social network analysis of the spread of covid-19 in south korea and policy implications. Scientific Reports 11(1), 8581.
- Karrer, B. and M. E. Newman (2011). Stochastic blockmodels and community structure in networks. Physical Review E 83(1), 016107.
- Kim, B., K. H. Lee, L. Xue, and X. Niu (2018). A review of dynamic network models with latent variables. Statistics Surveys 12, 105.
- Koren, Y., R. Bell, and C. Volinsky (2009). Matrix factorization techniques for recommender systems. Computer 42(8), 30–37.
- Kossinets, G. (2006). Effects of missing data in social networks. Social networks 28(3), 247–268.
- Kossinets, G. and D. J. Watts (2009). Origins of homophily in an evolving social network. American Journal of Sociology 115(2), 405–450.
- Lancichinetti, A. and S. Fortunato (2009). Community detection algorithms: a comparative analysis. Physical Review E 80(5), 056117.
- Li, G., M. Li, J. Wang, J. Wu, F.-X. Wu, and Y. Pan (2016). Predicting essential proteins based on subcellular localization, orthology and ppi networks. BMC Bioinformatics 17, 571–581.
- Li, J., G. Xu, and J. Zhu (2023). Statistical inference on latent space models for network data. arXiv preprint arXiv:2312.06605.
- Li, T. and C. M. Le (2024). Network estimation by mixing: Adaptivity and more. Journal of the American Statistical Association 119(547), 2190–2205.
- Li, T., E. Levina, and J. Zhu (2020). Network cross-validation by edge sampling. Biometrika 107(2), 257–276.
- Li, T., Y.-J. Wu, E. Levina, and J. Zhu (2023). Link prediction for egocentrically sampled networks. Journal of Computational and Graphical Statistics 32(4), 1296–1319.
- Liao, J. and G. Zou (2020). Corrected mallows criterion for model averaging. Computational Statistics & Data Analysis 144, 106902.
- Liao, J., G. Zou, Y. Gao, and X. Zhang (2021). Model averaging prediction for time series models with a diverging number of parameters. Journal of Econometrics 223(1), 190–221.
- Lyu, Z., D. Xia, and Y. Zhang (2023). Latent space model for higher-order networks and generalized tensor decomposition. Journal of Computational and Graphical Statistics 32(4), 1320–1336.
- Ma, Z., Z. Ma, and H. Yuan (2020). Universal latent space model fitting for large networks with edge covariates. Journal of Machine Learning Research 21(4), 1–67.
- Mariadassou, M. and T. Tabouy (2020). Consistency and asymptotic normality of stochastic block models estimators from sampled data. Electronic Journal of Statistics 14(2), 3672 – 3704.
- McPherson, M., L. Smith-Lovin, and J. M. Cook (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology 27(1), 415–444.
- Menon, A. K. and C. Elkan (2011). Link prediction via matrix factorization. In Joint european conference on machine learning and knowledge discovery in databases, pp. 437–452. Springer.
- Oh, M.-S. and A. E. Raftery (2001). Bayesian multidimensional scaling and choice of dimension. Journal of the American Statistical Association 96(455), 1031–1044.
- Oh, M.-S. and A. E. Raftery (2007). Model-based clustering with dissimilarities: A Bayesian approach. Journal of Computational and Graphical Statistics 16(3), 559–585.
- Óskarsdóttir, M., W. Ahmed, K. Antonio, B. Baesens, R. Dendievel, T. Donas, and T. Reynkens (2022). Social network analytics for supervised fraud detection in insurance. Risk Analysis 42(8), 1872–1890.
- Pan, R., X. Chang, X. Zhu, and H. Wang (2022). Link prediction via latent space logistic regression model. Statistics and its Interface 15(3), 267–282.
- Pan, R., Y. Gao, and H. Wang (2026). A latent space model for link prediction in statistical citation network. Journal of Multivariate Analysis 212, 105555.
- Qiu, Y. and X. Zhang (2025). A transfer learning framework for multilayer networks via model averaging. arXiv preprint arXiv:2506.12455.
- Serrat, O. (2017). Knowledge Solutions: Tools, Methods, and Approaches to Drive Organizational Performance. Springer Singapore.
- Sewell, D. K. and Y. Chen (2015). Latent space models for dynamic networks. Journal of the American Statistical Association 110(512), 1646–1657. 105–116.
- Sewell, D. K. and Y. Chen (2017). Latent space approaches to community detection in dynamic networks. Bayesian Analysis 12(2), 351 – 377.
- Song, X., Y. Zhang, R. Pan, and H. Wang (2022). Link prediction for statistical collaboration networks incorporating institutes and research interests. IEEE Access 10, 104954–104965.
- Sosa, J. and L. Buitrago (2021). A review of latent space models for social networks. Revista Colombiana de Estadística 44(1), 171–200.
- Tang, W. and J. Zhu (2025). Population-level balance in signed networks. Journal of the American Statistical Association 120(550), 751–763.
- Tanwar, M., R. Duggal, and S. K. Khatri (2015). Unravelling unstructured data: A wealth of information in big data. In 2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions), pp. 1–6.
- Yang, Y. (2001). Adaptive regression by mixing. Journal of the American Statistical Association 96(454), 574–588.
- Zhang, J., X. He, and J. Wang (2022). Directed community detection with network embedding. Journal of the American Statistical Association 117(540), 1809–1819.
- Zhang, X. and H. Liang (2011). Focused information criterion and model averaging for generalized additive partial linear models. The Annals of Statistics 39(1), 174–200. Zhang,
- X. and C.-A. Liu (2023). Model averaging prediction by K-fold cross-validation. Journal of Econometrics 235(1), 280–301.
- Zhang, X., A. T. Wan, and G. Zou (2013). Model averaging by jackknife criterion in models with dependent data. Journal of Econometrics 174(2), 82–94.
- Zhang, X., G. Xu, and J. Zhu (2022). Joint latent space models for network data with high-dimensional node
- Zhang, X., S. Xue, and J. Zhu (2020). A flexible latent space model for multilayer networks. In Proceedings of the 37th International Conference on Machine Learning, pp. 11288–11297.
- Zhu, L., D. Guo, J. Yin, G. Ver Steeg, and A. Galstyan (2016). Scalable temporal latent space inference for link prediction in dynamic social networks. IEEE Transactions on Knowledge and Data Engineering 28(10), 2765– 2777.
- Zhu, X., R. Pan, G. Li, Y. Liu, and H. Wang (2017). Network vector autoregression. The Annals of Statistics 45(3), 1096–1123. Yan Zhang
Acknowledgments
Jun Liao’s work was partially supported by the Humanities and Social Science Foundation of
the Ministry of Education of China (24YJC910004), and the National Natural Science Foundation of China (12001534). Xinyan Fan’s research was supported by the National Natural
Science Foundation of China (72571272, 12201626), the MOE Project of Key Research Institute of Humanities and Social Sciences (22JJD110001), and the Public Computing Cloud,
Renmin University of China. Kuangnan Fang’s research was supported by the National Natural Science Foundation of China (12571313,72233002), and the National Statistical Science
Research Projects(2025LD002).
Supplementary Materials
The supplementary material provides additional details including theoretical proofs, assumption verifications, algorithmic procedures, and extended simulation and empirical results.