Back To Index Previous Article Next Article Full Text

Statistica Sinica 24 (2014), 1771-1786

THE HIGHEST DIMENSIONAL STOCHASTIC BLOCKMODEL
WITH A REGULARIZED ESTIMATOR
Karl Rohe, Tai Qin and Haoyang Fan
University of Wisconsin, Madison

Abstract: In the high-dimensional Stochastic Blockmodel for a random network, the number of clusters (or blocks) K grows with the number of nodes N. Two previous studies have examined the statistical estimation performance of spectral clustering and the maximum likelihood estimator under the high-dimensional model; neither of these results allow K to grow faster than N12. We study a model where, ignoring log terms, K can grow proportionally to N. Since the number of clusters must be smaller than the number of nodes, no reasonable model allows K to grow faster; thus, our asymptotic results are the “highest” dimensional. To push the asymptotic setting to this extreme, we make additional assumptions that are motivated by empirical observations in physical anthropology ((Dunbar (1992)), and in an in-depth study of massive empirical networks ((Leskovec (2008)). We develop a regularized maximum likelihood estimator that leverages these insights and prove that, under certain conditions, the proportion of nodes that the regularized estimator misclusters converges to zero. We thus introduce and demonstrate the advantages of statistical regularization in a parametric form for network analysis.

Key words and phrases: Consistency, high dimensional, stochastic block model, regularization, clustering.

Back To Index Previous Article Next Article Full Text