Abstract
Research on network data with nodal covariates has received increasing attention, yet
few studies have focused on nonlinear patterns among nodal covariates. In this work, we propose a
model-free framework that leverages network information to achieve nonlinear dimension reduction
of nodal covariates. An efficient regularization-based estimation procedure is proposed and the
asymptotic properties of estimated projection directions are studied. For the downstream task of
community detection, we propose a two-step algorithm along with theoretical guarantees. Besides,
we draw connections between our method and three existing kernel methods. Extensive simulations
and a real data analysis support the advantages of the proposed method.
Key words and phrases: Nonlinear dimension reduction, Community detection, Degree-corrected stochastic block-model
Information
| Preprint No. | SS-2025-0362 |
|---|---|
| Manuscript ID | SS-2025-0362 |
| Complete Authors | Zhonghan Wang, Shenbin Zheng, Junlong Zhao |
| Corresponding Authors | Junlong Zhao |
| Emails | zhaojunlong928@126.com |
References
- Amini, A. and E. Levina (2018). On semidefinite relaxations for the block model. The Annals of Statistics 46(1), 149–179.
- Bansal, M., G. D. Gatta, and D. Di Bernardo (2006). Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics 22(7), 815–822.
- Barshan, E., A. Ghodsi, Z. Azimifar, and M. Z. Jahromi (2011). Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds. Pattern Recognition 44(7), 1357– 1371.
- Baudat, G. and F. Anouar (2000). Generalized discriminant analysis using a kernel approach. Neural Computation 12(10), 2385–2404.
- Berlinet, A. and C. Thomas-Agnan (2011). Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer Science & Business Media.
- Binkiewicz, N., J. T. Vogelstein, and K. Rohe (2017). Covariate-assisted spectral clustering. Biometrika 104(2), 361–377.
- Bomiriya, R. P., A. R. Kuvelkar, D. R. Hunter, and S. Triebel (2023). Modeling homophily in exponential-family random graph models for bipartite networks. arXiv:2312.05673.
- Caliński, T. and J. Harabasz (1974). A dendrite method for cluster analysis. Communications in Statistics Theory and Methods 3(1), 1–27.
- Davezies, L., X. D’haultfœuille, and Y. Guyonvarch (2021). Empirical process results for exchangeable arrays. The Annals of Statistics 49(2), 845–862.
- Elkabani, I. and R. A. A. Khachfeh (2015). Homophily-based link prediction in the facebook online social network: A rough sets approach. Journal of Intelligent Systems 24(4), 491–503.
- Fan, J., J. Ge, and J. Hou (2025). Covariates-adjusted mixed-membership estimation: A novel network model with optimal guarantees. arXiv:2502.06671.
- Gao, T., Y. Zhang, R. Pan, and H. Wang (2023). Large-scale multi-layer academic networks derived from statistical publications. arXiv:2308.11287.
- Gretton, A., K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola (2012). A kernel two-sample test. The Journal of Machine Learning Research 13(25), 723–773.
- Hu, Y. and W. Wang (2024). Network-adjusted covariates for community detection. Biometrika 111(4), 1221– 1240.
- Huang, S., J. Sun, and Y. Feng (2024). PCABM: Pairwise covariates-adjusted block model for community detection. Journal of the American Statistical Association 119(547), 2092–2104.
- Hubert, L. and P. Arabie (1985). Comparing partitions. Journal of Classification 2, 193–218.
- Hunter, D. R., S. M. Goodreau, and M. S. Handcock (2008). Goodness of fit of social network models. Journal of the American Statistical Association 103(481), 248–258.
- Jackson, M., S. M. Nei, E. Snowberg, and L. Yariv (2023). The dynamics of networks and homophily. SSRN Electronic Journal.
- Karrer, B. and M. E. Newman (2011). Stochastic blockmodels and community structure in networks. Physical Review E 83(1), 016107.
- Lam, C. and Q. Yao (2012). Factor modeling for high-dimensional time series: inference for the number of factors. The Annals of Statistics 40(2), 694–726.
- Li, B. and J. Song (2017). Nonlinear sufficient dimension reduction for functional data. The Annals of Statistics 45(3), 1059–1095.
- Luxburg, U. V. (2007). A tutorial on spectral clustering. Statistics and Computing 17(4), 395–416.
- Ma, Z., Z. Ma, and H. Yuan (2020). Universal latent space model fitting for large networks with edge covariates. Journal of Machine Learning Research 21(4), 1–67.
- McPherson, M., L. Smith-Lovin, and J. M. Cook (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology 27, 415–444.
- Menzel, K. (2021). Bootstrap with cluster-dependence in two or more dimensions. Econometrica 89(5), 2143–2188.
- Merris, R. (1994). Laplacian matrices of graphs: a survey. Linear Algebra and its Applications 197-198, 143–176.
- Ogburn, E. L., O. Sofrygin, I. Diaz, and M. J. Van der Laan (2024). Causal inference for social network data. Journal of the American Statistical Association 119(545), 597–611.
- Qiu, Y. (2024). Large-scale eigenvalue decomposition and svd with RSpectra. Website. https://cran.r-project. org/web/packages/RSpectra/vignettes/introduction.html.
- Rhodes, A. (2018). The age of belonging: friendship formation after residential mobility. Social Forces 97(2), 583–606.
- Roy, S., Y. Atchadé, and G. Michailidis (2019). Likelihood inference for large scale stochastic blockmodels with covariates based on a divide-and-conquer parallelizable algorithm with communication. Journal of Computational and Graphical Statistics 28(3), 609–619.
- Schölkopf, B., A. Smola, and K.-R. Müller (1997). Kernel principal component analysis. In Artificial Neural Networks — ICANN’97, pp. 583–588. Springer Berlin Heidelberg.
- Segal, E., M. Shapira, A. Regev, D. Pe’er, D. Botstein, D. Koller, and N. Friedman (2003). Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nature Genetics 34(2), 166–176.
- Stein, S. and C. Leng (2023). An annotated graph model with differential degree heterogeneity for directed networks. Journal of Machine Learning Research 24(119), 1–69.
- Stein, S. and C. Leng (2025). A sparse beta regression model for network analysis. Journal of the American Statistical Association 120(550), 1281–1293.
- Sweet, T. M. (2015). Incorporating covariates into stochastic blockmodels. Journal of Educational and Behavioral Statistics 40(6), 635–664.
- Virta, J., K.-Y. Lee, and L. Li (2022). Sliced inverse regression in metric spaces. Statistica Sinica 32, 2315–2337.
- Xu, M. and Q. Wang (2023). A network poisson model for weighted directed networks with covariates. Communications in Statistics - Theory and Methods 52(15), 5274–5293.
- Xu, R. and D. C. Wunsch II (2008). Clustering. John Wiley & Sons.
- Xu, S., Y. Zhen, and J. Wang (2023). Covariate-assisted community detection in multi-layer networks. Journal of Business & Economic Statistics 41(3), 915–926.
- Yan, B. and P. Sarkar (2021). Covariate regularized community detection in sparse graphs. Journal of the American Statistical Association 116(534), 734–745.
- Ying, C. and Z. Yu (2022). Fréchet sufficient dimension reduction for random objects. Biometrika 109(4), 975–992.
- Zhang, Q., B. Li, and L. Xue (2024). Nonlinear sufficient dimension reduction for distribution-on-distribution regression. Journal of Multivariate Analysis 202, 105302.
- Zhao, J., X. Liu, H. Wang, and C. Leng (2022). Dimension reduction for covariates in network data. Biometrika 109(1), 85–102.
Acknowledgments
Junlong Zhao’s research was supported in part by National Natural Science Foundation
of China grants No.12371288 and 12131006, and the Fundamental Research Funds for the
Central Universities.
Supplementary Materials
The Supplementary Materials include proofs of all the theoretical results in the main text,
additional simulations, discussions and theoretical results.