Network Assisted Approximate Factor Model Estimation

Yuzhou Zhao, Xinyan Fan and Bo Zhang

doi:10.5705/ss.202024.0170

Abstract

The factor models are powerful tools for uncovering patterns of similar

ity or co-movement among individuals, and they have been successfully applied

in the fields of finance and biology. However, the classical approximate factor

model encounters limitations when dealing with small sample sizes.

To overcome this challenge, we leverage auxiliary network information and propose a

novel joint quasi-maximum likelihood estimation, which can use the network information flexibly and allow network heterogeneity. The theoretical properties of

these estimators are rigorously established. We obtain a new convergence rate,

which is faster than the rate of classical maximum likelihood estimators when

the sample size is small. Numerous numerical studies have been conducted to

evaluate the performance of the proposed methods.

Key words and phrases: approximate factor model, high dimensionality, latent space model, network structure, penalized maximum likelihood *Corresponding author

Information

Preprint No.	SS-2024-0170
Manuscript ID	SS-2024-0170
Complete Authors	Yuzhou Zhao, Xinyan Fan, Bo Zhang
Corresponding Authors	Xinyan Fan
Emails	1031820039@qq.com

References

Anton, M. and C. Polk (2014). Connected stocks. The Journal of Finance 69(3), 1099–1127.
Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica 71(1), 135–171.
Bai, J. and K. Li (2012). Statistical analysis of factor models of high dimension. The Annals of Statistics 40(1), 436–465.
Bai, J. and K. Li (2016). Maximum likelihood estimation and inference for approximate factor models of high dimension. Review of Economics and Statistics 98(2), 298–309.
Bai, J. and Y. Liao (2016). Efficient estimation of approximate factor models via penalized maximum likelihood. Journal of Econometrics 191(1), 1–18.
Bai, J. and S. Ng (2002). Determining the number of factors in approximate factor models. Econometrica 70(1), 191–221.
Bien, J. and R. J. Tibshirani (2011). Sparse estimation of a covariance matrix. Biometrika 98(4), 807–820.
Bradley, R. C. (2005). Basic properties of strong mixing conditions. a survey and some open questions. Probability Surveys 2, 107–144.
Chamberlain, G. and M. Rothschild (1983). Arbitrage, factor structure, and mean-variance analysis on large asset markets. Econometrica: Journal of the Econometric Society, 1281– 1304.
Fama, E. F. and K. R. French (1992). The cross-section of expected stock returns. The Journal of Finance 47(2), 427–465.
Fan, J., Y. Liao, and M. Mincheva (2011). High dimensional covariance matrix estimation in approximate factor models. The Annals of Statistics 39(6), 3320–3356.
Fan, J., Y. Liao, and M. Mincheva (2013). Large covariance estimation by thresholding principal orthogonal complements. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75(4), 603–680.
Fan, J., Y. Liao, and W. Wang (2016). Projected principal component analysis in factor models. The Annals of Statistics 44(1), 219–254.
Fountalis, I., A. Bracco, and C. Dovrolis (2014). Spatio-temporal network analysis for studying climate patterns. Climate Dynamics 42, 879–899.
Hoff, P. D., A. E. Raftery, and M. S. Handcock (2002). Latent space approaches to social network analysis. Journal of the American Statistical Association 97(460), 1090–1098.
Huang, J. and L. Yang (2010). Correlation matrix with block structure and efficient sampling methods. Journal of Computational Finance 14(1), 81–94.
Krivitsky, P. N., M. S. Handcock, A. E. Raftery, and P. D. Hoff (2009). Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models. Social Networks 31(3), 204–213.
Lin, F. and Z. Qiu (2024). Pairs trading strategy and connected stocks: Evidence from china. Available at SSRN 4790701.
Linton, O. and G. Connor (2000). Semiparametric estimation of a characteristic-based factor model of stock returns. Technical report, Financial Markets Group.
Mayrink, V. D. and J. E. Lucas (2013). Sparse latent factor models with interactions: Analysis of gene expression data. The Annals of Applied Statistics, 799–822.
McPherson, M., L. Smith-Lovin, and J. M. Cook (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology 27(1), 415–444.
Merlevède, F., M. Peligrad, and E. Rio (2011). A bernstein type inequality and moderate deviations for weakly dependent sequences. Probability Theory and Related Fields 151(3), 435–474.
Rubin-Delanchy, P., J. Cape, M. Tang, and C. E. Priebe (2022). A statistical interpretation of spectral embedding: the generalised random dot product graph. Journal of the Royal Statistical Society Series B: Statistical Methodology 84(4), 1446–1473.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58(1), 267–288.
Von Ferber, C., T. Holovatch, Y. Holovatch, and V. Palchykov (2009). Public transport networks: empirical analysis and modeling. The European Physical Journal B 68, 261–275.
Wong, S. L., L. V. Zhang, A. H. Tong, Z. Li, D. S. Goldberg, O. D. King, G. Lesage, M. Vidal,
B. Andrews, H. Bussey, et al. (2004). Combining biological networks to predict genetic interactions. Proceedings of the National Academy of Sciences 101(44), 15682–15687.
Xie, W.-J., M.-X. Li, Z.-Q. Jiang, Q.-Z. Tan, B. Podobnik, W.-X. Zhou, and H. E. Stanley
(2016). Skill complementarity enhances heterophily in collaboration networks. Scientific reports 6(1), 18727.
Xue, L., S. Ma, and H. Zou (2012). Positive-definite ℓ1-penalized estimation of large covariance matrices. Journal of the American Statistical Association 107(500), 1480–1491.
Yi, H., Q. Zhang, C. Lin, and S. Ma (2022). Information-incorporated gaussian graphical model for gene expression data. Biometrics 78(2), 512–523.
Yu, L., Y. He, X. Zhang, and J. Zhu (2020). Network-assisted estimation for largedimensional factor model with guaranteed convergence rate improvement. arXiv preprint arXiv:2001.10955.
Zhang, X., G. Xu, and J. Zhu (2022). Joint latent space models for network data with highdimensional node variables. Biometrika 109(3), 707–720.
Zhang, X., S. Xue, and J. Zhu (2020). A flexible latent space model for multilayer networks. In International Conference on Machine Learning, pp. 11288–11297. PMLR.
Zou, T., W. Lan, H. Wang, and C.-L. Tsai (2017). Covariance regression analysis. Journal of the American Statistical Association 112(517), 266–281.

Acknowledgments

This work is supported by National natural Science Foundation of China

(72271232, 71873137, 12201626), the MOE Project of Key Research Institute of Humanities and Social Sciences (22JJD110001), and Public Com-

puting Cloud of Renmin University of China.

Supplementary Materials

The Supplementary Material consists of ten sections (S.1–S.10). Section

S.1 provides a more general form of Theorem 3. Section S.2 introduces

some useful notations and lemmas that are used to prove the theoretical

properties in Section 3. Sections S.3–S.7 present the proofs of Theorems

1, 2, S.1 and 3, 4, and Proposition 1, respectively. Section S.8 provides

additional algorithmic details. Section S.9 details the comparison methods.

Section S.10 presents additional simulation results.

Supplementary materials are available for download.

[1] Anton, M. and C. Polk (2014). Connected stocks. The Journal of Finance 69(3), 1099–1127.

[2] Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica 71(1), 135–171.

[3] Bai, J. and K. Li (2012). Statistical analysis of factor models of high dimension. The Annals of Statistics 40(1), 436–465.

[4] Bai, J. and K. Li (2016). Maximum likelihood estimation and inference for approximate factor models of high dimension. Review of Economics and Statistics 98(2), 298–309.

[5] Bai, J. and Y. Liao (2016). Efficient estimation of approximate factor models via penalized maximum likelihood. Journal of Econometrics 191(1), 1–18.

[6] Bai, J. and S. Ng (2002). Determining the number of factors in approximate factor models. Econometrica 70(1), 191–221.

[7] Bien, J. and R. J. Tibshirani (2011). Sparse estimation of a covariance matrix. Biometrika 98(4), 807–820.

[8] Bradley, R. C. (2005). Basic properties of strong mixing conditions. a survey and some open questions. Probability Surveys 2, 107–144.

[9] Chamberlain, G. and M. Rothschild (1983). Arbitrage, factor structure, and mean-variance analysis on large asset markets. Econometrica: Journal of the Econometric Society, 1281– 1304.

[10] Fama, E. F. and K. R. French (1992). The cross-section of expected stock returns. The Journal of Finance 47(2), 427–465.

[11] Fan, J., Y. Liao, and M. Mincheva (2011). High dimensional covariance matrix estimation in approximate factor models. The Annals of Statistics 39(6), 3320–3356.

[12] Fan, J., Y. Liao, and M. Mincheva (2013). Large covariance estimation by thresholding principal orthogonal complements. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75(4), 603–680.

[13] Fan, J., Y. Liao, and W. Wang (2016). Projected principal component analysis in factor models. The Annals of Statistics 44(1), 219–254.

[14] Fountalis, I., A. Bracco, and C. Dovrolis (2014). Spatio-temporal network analysis for studying climate patterns. Climate Dynamics 42, 879–899.

[15] Hoff, P. D., A. E. Raftery, and M. S. Handcock (2002). Latent space approaches to social network analysis. Journal of the American Statistical Association 97(460), 1090–1098.

[16] Huang, J. and L. Yang (2010). Correlation matrix with block structure and efficient sampling methods. Journal of Computational Finance 14(1), 81–94.

[17] Krivitsky, P. N., M. S. Handcock, A. E. Raftery, and P. D. Hoff (2009). Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models. Social Networks 31(3), 204–213.

[18] Lin, F. and Z. Qiu (2024). Pairs trading strategy and connected stocks: Evidence from china. Available at SSRN 4790701.

[19] Linton, O. and G. Connor (2000). Semiparametric estimation of a characteristic-based factor model of stock returns. Technical report, Financial Markets Group.

[20] Mayrink, V. D. and J. E. Lucas (2013). Sparse latent factor models with interactions: Analysis of gene expression data. The Annals of Applied Statistics, 799–822.

[21] McPherson, M., L. Smith-Lovin, and J. M. Cook (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology 27(1), 415–444.

[22] Merlevède, F., M. Peligrad, and E. Rio (2011). A bernstein type inequality and moderate deviations for weakly dependent sequences. Probability Theory and Related Fields 151(3), 435–474.

[23] Rubin-Delanchy, P., J. Cape, M. Tang, and C. E. Priebe (2022). A statistical interpretation of spectral embedding: the generalised random dot product graph. Journal of the Royal Statistical Society Series B: Statistical Methodology 84(4), 1446–1473.

[24] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58(1), 267–288.

[25] Von Ferber, C., T. Holovatch, Y. Holovatch, and V. Palchykov (2009). Public transport networks: empirical analysis and modeling. The European Physical Journal B 68, 261–275.

[26] Wong, S. L., L. V. Zhang, A. H. Tong, Z. Li, D. S. Goldberg, O. D. King, G. Lesage, M. Vidal,

[27] B. Andrews, H. Bussey, et al. (2004). Combining biological networks to predict genetic interactions. Proceedings of the National Academy of Sciences 101(44), 15682–15687.

[28] Xie, W.-J., M.-X. Li, Z.-Q. Jiang, Q.-Z. Tan, B. Podobnik, W.-X. Zhou, and H. E. Stanley

[29] (2016). Skill complementarity enhances heterophily in collaboration networks. Scientific reports 6(1), 18727.

[30] Xue, L., S. Ma, and H. Zou (2012). Positive-definite ℓ1-penalized estimation of large covariance matrices. Journal of the American Statistical Association 107(500), 1480–1491.

[31] Yi, H., Q. Zhang, C. Lin, and S. Ma (2022). Information-incorporated gaussian graphical model for gene expression data. Biometrics 78(2), 512–523.

[32] Yu, L., Y. He, X. Zhang, and J. Zhu (2020). Network-assisted estimation for largedimensional factor model with guaranteed convergence rate improvement. arXiv preprint arXiv:2001.10955.

[33] Zhang, X., G. Xu, and J. Zhu (2022). Joint latent space models for network data with highdimensional node variables. Biometrika 109(3), 707–720.

[34] Zhang, X., S. Xue, and J. Zhu (2020). A flexible latent space model for multilayer networks. In International Conference on Machine Learning, pp. 11288–11297. PMLR.

[35] Zou, T., W. Lan, H. Wang, and C.-L. Tsai (2017). Covariance regression analysis. Journal of the American Statistical Association 112(517), 266–281.