Abstract
Community network research has received extensive attention, and the study of dynamic
random block networks is one of the emerging directions of network analysis. This paper first proposes
a mixed membership autoregressive network model with a first-order autoregressive structure, then
discusses the estimation of the membership matrix and community detection, and gives an AR-1
mixed spectral clustering algorithm for this model to estimate the membership matrix for community
detection.
In addition, an empirical eigenvalue threshold estimator is introduced to estimate the
number of communities. Simulation results show that our method shows stronger generalization ability
than previous methods for both random block community detection problems and mixed member
community detection problems. Under different settings, the explicit error rate of our method does
not exceed that of previous methods, and in many cases our method performs better. Application to
real data proves the effectiveness of our method.
Information
| Preprint No. | SS-2024-0435 |
|---|---|
| Manuscript ID | SS-2024-0435 |
| Complete Authors | Tianyi Sun, Bo Zhang, Baisuo Jin, Yuehua Wu |
| Corresponding Authors | Bo Zhang |
| Emails | zhangbo890301@outlook.com |
References
- Abbe, E. (2017). Community detection and stochastic block models: recent developments. The Journal of Machine Learning Research 18(1), 6446–6531.
- Airoldi, E. M., D. Blei, S. Fienberg, and E. Xing (2008). Mixed membership stochastic blockmodels. Advances in neural information processing systems 21.
- Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In 2nd International Symposium on Information Theory, pp. 267–281. Akad´emiai Kiad´o Location Budapest, Hungary.
- Amini, A. A., A. Chen, P. J. Bickel, and E. Levina (2013). Pseudo-likelihood methods for community detection in large sparse networks. The Annals of Statistics, 2097–2122.
- Anandkumar, A., R. Ge, D. Hsu, and S. M. Kakade (2014). A tensor approach to learning mixed membership community models. The Journal of Machine Learning Research 15(1), 2239–2312.
- Bai, Z., K. P. Choi, and Y. Fujikoshi (2018). Consistency of aic and bic in estimating the number of significant components in high-dimensional principal component analysis. The Annals of Statistics 46(3), 1050–1076.
- Barab´asi, A.-L. and Z. N. Oltvai (2004). Network biology: understanding the cell’s functional organization. Nature reviews genetics 5(2), 101–113.
- Barbieri, K., O. M. Keshk, and B. M. Pollins (2009). Trading data: Evaluating our assumptions and coding rules. Conflict Management and Peace Science 26(5), 471–491.
- Ben-Daya, M., E. Hassini, and Z. Bahroun (2019). Internet of things and supply chain management: a literature review. International journal of production research 57(15-16), 4719–4742.
- Bhattacharjee, M., M. Banerjee, and G. Michailidis (2020). Change point estimation in a dynamic stochastic block model. The Journal of Machine Learning Research 21(1), 4330–4388.
- Bickel, P., D. Choi, X. Chang, and H. Zhang (2013). Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels. The Annals of Statistics 41(4), 1922–1943.
- Cai, T. T. and X. Li (2015). Robust and computationally feasible community detection in the presence of arbitrary outlier nodes. The Annals of Statistics, 1027–1059.
- Chatterjee, S. (2015). Matrix estimation by universal singular value thresholding. The Annals of Statistics 43(1), 177–214.
- Chen, Y., X. Li, and J. Xu (2018). Convexified modularity maximization for degree-corrected stochastic block models. The Annals of Statistics 46(4), 1573.
- Donnat, C. and S. Holmes (2018). Tracking network dynamics: A survey using graph distances. The Annals of Applied Statistics 12(2), 971–1012.
- Durante, D. and D. B. Dunson (2016). Locally adaptive dynamic networks. The Annals of Applied Statistics, 2203–2232.
- Easley, D., J. Kleinberg, et al. (2010). Networks, crowds, and markets: Reasoning about a highly connected world, Volume 1. Cambridge university press Cambridge.
- Girvan, M. and M. E. Newman (2002). Community structure in social and biological networks. Proceedings of the national academy of sciences 99(12), 7821–7826.
- Goldenberg, A., A. X. Zheng, S. E. Fienberg, E. M. Airoldi, et al. (2010). A survey of statistical network models. Foundations and Trends® in Machine Learning 2(2), 129–233.
- Hagen, L. and A. B. Kahng (1992). New spectral methods for ratio cut partitioning and clustering. IEEE transactions on computer-aided design of integrated circuits and systems 11(9), 1074–1085.
- Handcock, M. S., A. E. Raftery, and J. M. Tantrum (2007). Model-based clustering for social networks. Journal of the Royal Statistical Society: Series A (Statistics in Society) 170(2), 301–354.
- Hoff, P. (2007). Modeling homophily and stochastic equivalence in symmetric relational data. Advances in neural information processing systems 20.
- Holland, P. W., K. B. Laskey, and S. Leinhardt (1983). Stochastic blockmodels: First steps. Social networks 5(2), 109–137.
- Holland, P. W. and S. Leinhardt (1981). An exponential family of probability distributions for directed graphs. Journal of the american Statistical association 76(373), 33–50.
- Jiang, B., J. Li, and Q. Yao (2023). Autoregressive networks. Journal of Machine Learning Research 24(227), 1–69.
- Jin, J. (2015). Fast community detection by score. The Annals of Statistics 43(1), 57–89.
- Karrer, B. and M. E. Newman (2011). Stochastic blockmodels and community structure in networks. Physical review E 83(1), 016107.
- Kaufmann, E., T. Bonald, and M. Lelarge (2018). A spectral algorithm with additive clustering for the recovery of overlapping communities in networks. Theoretical Computer Science 742, 3–26.
- Keshk, O. M. (2017). Correlates of war project trade data set codebook, version 4.0 1870-2014 katherine barbieri, university of south carolina. Science 26(5), 471–491.
- Kolaczyk, E. D. and G. Cs´ardi (2014). Statistical analysis of network data with R, Volume 65. Springer.
- Krivitsky, P. N. and M. S. Handcock (2014). A separable model for dynamic networks. Journal of the Royal Statistical Society Series B: Statistical Methodology 76(1), 29–46.
- Lee, E., F. Karimi, C. Wagner, H.-H. Jo, M. Strohmaier, and M. Galesic (2019). Homophily and minority-group size explain perception biases in social networks. Nature human behaviour 3(10), 1078–1087.
- Lei, J. (2021). Network representation using graph root distributions. The Annals of Statistics 49(2).
- Lei, J. and A. Rinaldo (2015). Consistency of spectral clustering in stochastic block models. The Annals of Statistics, 215–237.
- Ludkin, M., I. Eckley, and P. Neal (2018). Dynamic stochastic block models: parameter estimation and detection of changes in community structure. Statistics and Computing 28, 1201–1213.
- Ma, S., L. Su, and Y. Zhang (2021). Determining the number of communities in degree-corrected stochastic block models. The Journal of Machine Learning Research 22(1), 3217–3279.
- Majid, M., S. Habib, A. R. Javed, M. Rizwan, G. Srivastava, T. R. Gadekallu, and J. C.-W. Lin (2022). Applications of wireless sensor networks and internet of things frameworks in the industry revolution 4.0: A systematic literature review. Sensors 22(6), 2087.
- Mao, X., P. Sarkar, and D. Chakrabarti (2017). On mixed memberships and symmetric nonnegative matrix factorizations. In International Conference on Machine Learning, pp. 2324–2333. PMLR.
- Mao, X., P. Sarkar, and D. Chakrabarti (2021). Estimating mixed memberships with sharp eigenvector deviations. Journal of the American Statistical Association 116(536), 1928–1940.
- Mastrandrea, R., J. Fournet, and A. Barrat (2015). Contact patterns in a high school: a comparison between data collected using wearable sensors, contact diaries and friendship surveys. PloS one 10(9), e0136497.
- Matias, C. and V. Miele (2017). Statistical clustering of temporal networks through a dynamic stochastic block model. Journal of the Royal Statistical Society Series B: Statistical Methodology 79(4), 1119–1141.
- Newman, M. E. (2003). The structure and function of complex networks. SIAM review 45(2), 167–256.
- Newman, M. E. (2004). Coauthorship networks and patterns of scientific collaboration. Proceedings of the national academy of sciences 101(suppl 1), 5200–5205.
- Pensky, M. (2019). Dynamic network models and graphon estimation. The Annals of Statistics 47(4).
- Rohe, K., S. Chatterjee, and B. Yu (2011). Spectral clustering and the high-dimensional stochastic blockmodel. The Annals of Statistics, 1878–1915.
- Schwarz, G. (1978). Estimating the dimension of a model. The annals of statistics, 461–464.
- Shi, J. and J. Malik (2000). Normalized cuts and image segmentation. IEEE Transactions on pattern analysis and machine intelligence 22(8), 888–905.
- Tian, H., B. Zhang, R. Jiang, and X. Han (2024+). A new preferential model with homophily for recommender systems. Statistica Sinica. DOI:10.5705/ss.202022.0136.
- Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and computing 17, 395–416.
- Xu, J. (2018). Rates of convergence of spectral methods for graphon estimation. In International Conference on Machine Learning, pp. 5433–5442. PMLR.
- Zhang, W., B. Jin, and Z. Bai (2021). Learning block structures in u-statistic-based matrices. Biometrika 108(4), 933–946.
- Zhang, Y., E. Levina, and J. Zhu (2020). Detecting overlapping communities in networks using spectral methods. SIAM Journal on Mathematics of Data Science 2(2), 265–283.
- Zhao, Y., E. Levina, and J. Zhu (2012). Consistency of community detection in networks under degree-corrected stochastic block models. The Annals of Statistics 40(4), 2266–2292. Acknowledgements Bo Zhang (Corresponding author) is partially supported by National Key R&D Program of China (2022YFA1008000) and National Natural Science Foundation of China ( 12471268 & 12001517 & 72091212). Baisuo Jin is partially supported by the National Natural Science Foundation (Grants 12231017,72293573) Yuehua Wu is supported by the Natural Science and Engineering Research Council of Canada (No. RGPIN2023-05655). Tianyi Sun, Department of Statistics and Finance, University of Science and Technology of China, Hefei, China.
Acknowledgments
Bo Zhang (Corresponding author) is partially supported by National Key R&D Program of China (2022YFA1008000)
and National Natural Science Foundation of China ( 12471268 & 12001517 & 72091212).
Baisuo Jin is partially supported by the National Natural Science Foundation (Grants 12231017,72293573)
Yuehua Wu is supported by the Natural Science and Engineering Research Council of Canada (No. RGPIN-
2023-05655).
Tianyi Sun, Department of Statistics and Finance, University of Science and Technology of China, Hefei, China.
Bo Zhang, Department of Statistics and Finance, University of Science and Technology of China, Hefei, China.
Baisuo Jin, School of Mathemtical Sciences, Xinjiang Normal University; Department of Statistics and Finance,
Yuehua Wu, Department of Mathematics and Statistics, York University, Toronto, Canada
Supplementary Materials
The supplementary material offers some additional remarks to the proposed model, the supplement to the real data analysis, and details of the proofs of all propositions and theorems.