Back To Index Previous Article Next Article Full Text

Statistica Sinica 35 (2025), 1835-1857

SCALABLE COMMUNITY DETECTION IN MASSIVE
NETWORKS USING AGGREGATED RELATIONAL DATA

Timothy Jones1, Owen G. Ward2, Yiran Jiang3
John Paisley1 and Tian Zheng*1

1Columbia University, 2Simon Fraser Univesity and
3Purdue University

Abstract: The mixed membership stochastic blockmodel (MMSB) is a popular Bayesian network model for community detection. Fitting such large Bayesian network models quickly becomes computationally infeasible when the number of nodes grows into hundreds of thousands and millions. In this paper we propose a novel mini-batch strategy based on aggregated relational data that leverages nodal information to fit MMSB to massive networks. We describe a scalable inference method that can utilise nodal information that often accompanies real-world networks. Conditioning on this extra information leads to a model that admits a parallel stochastic variational inference algorithm, utilising stochastic gradients of bipartite graphs formed from aggregated network ties between node subpopulations. We apply our method to a citation network with over two million nodes and 25 million edges, capturing explainable structure in this network. Our method recovers parameters and achieves better convergence on simulated networks generated according to the MMSB.

Key words and phrases: Aggregated relational data, community detection, mixed-membership, network data.

Back To Index Previous Article Next Article Full Text