Sparse Factor Model for High Dimensional Time Series

Xiaoran Wu, Baojun Dou and Rongmao Zhang

doi:10.5705/ss.202023.0219

Abstract

Factor models have been extensively employed in high dimensional

time series. However, little is known for the case with the sparse loading matrix.

This paper introduces a sparse factor model with an easy-to-implement estimation method, aiming to enhance interpretability and relax the constraints on the

dimension p of the time series. In particular, it is shown that under weak conditions, the loading space could be consistently estimated with a convergence rate

related to the sparseness of each column in the loading matrix and the eigenvalues

used to recover the latent factor and loading matrix. In addition, a randomized

sequential test is introduced to determine the number of sparse factors. Simulations and real data analysis on sea surface air pressure and stock portfolios are

also provided to illustrate the performance of the proposed method.

Key words and phrases: High dimensional time series, α-mixing, Orthogonal projection, Sparse factor model

Information

Preprint No.	SS-2023-0219
Manuscript ID	SS-2023-0219
Complete Authors	Xiaoran Wu, Baojun Dou, Rongmao Zhang
Corresponding Authors	Rongmao Zhang
Emails	rmzhang@zju.edu.cn

References

Ahn, S. C. and Horenstein, A. R. (2013). Eigenvalue ratio test for the number of factors. Econometrica, 81(3):1203–1227.
Ando, T. and Bai, J. (2016). Panel data models with grouped factor structure under unknown group membership. Journal of Applied Econometrics, 31(1):163–191.
Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica, 71(1):135–171.
Bai, J. (2009). Panel data models with interactive fixed effects. Econometrica, 77(4):1229–1279.
Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70(1):191–221.
Bai, J. and Ng, S. (2023). Approximate factor models with weaker loadings. Journal of Econometrics, 235(2):1893–1916.
Baik, J. and Silverstein, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked population models. Journal of Multivariate Analysis, 97(6):1382–1408.
Cai, T. T., Ma, Z., and Wu, Y. (2013). Sparse pca: Optimal rates and adaptive estimation. The Annals of Statistics, 41(6):3074–3110.
Chang, J., Chen, C., Qiao, X., and Yao, Q. (2024). An autocovariance-based learning framework for high-dimensional functional time series. Journal of Econometrics, 239(2):105385.
Chang, J., Guo, B., and Yao, Q. (2015). High dimensional stochastic regression with latent factors, endogeneity and nonlinearity. Journal of Econometrics, 189(2):297–312.
Chang, J., Guo, B., and Yao, Q. (2018). Principal component analysis for second-order stationary vector time series. The Annals of Statistics, 46(5):2094–2124.
Chang, J., He, J., Yang, L., and Yao, Q. (2023). Modelling matrix time series via a tensor cpdecomposition. Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(1):127– 148.
Chudik, A., Pesaran, M. H., and Tosetti, E. (2011). Weak and strong cross section dependence and estimation of large panels. The Econometrics Journal, 14(1):C45–C90.
Fan, J., Liao, Y., and Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(4):603–680.
Hallin, M. and Liˇska, R. (2007). Determining the number of factors in the general dynamic factor model. Journal of the American Statistical Association, 102(478):603–617.
Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association, 104(486):682–693.
Jolliffe, I. T., Trendafilov, N. T., and Uddin, M. (2003). A modified principal component technique based on the lasso. Journal of computational and Graphical Statistics, 12(3):531–547.
Lam, C. and Yao, Q. (2012). Factor modeling for high-dimensional time series: Inference for the number of factors. The Annals of Statistics, 40(2):694–726.
Lam, C., Yao, Q., and Bathia, N. (2011). Estimation of latent factors for high-dimensional time series. Biometrika, 98(4):901–918.
Ma, Z. (2013). Sparse principal component analysis and iterative thresholding. The Annals of Statistics, 41(2):772–801.
Mackey, L. (2008). Deflation methods for sparse pca. In Advances in Neural Information Processing Systems, volume 21.
Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach. The Annals of Statistics, 36(6):2791–2817.
Onatski, A. (2009). Testing hypotheses about the number of factors in large factor models. Econometrica, 77(5):1447–1479.
Onatski, A. (2010). Determining the number of factors from empirical distribution of eigenvalues. The Review of Economics and Statistics, 92(4):1004–1016.
Pan, J. and Yao, Q. (2008). Modelling multiple time series via common factors. Biometrika, 95(2):365– 379.
Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica, 17(4):1617–1642.
Pelger, M. and Xiong, R. (2022). Interpretable sparse proximate factors for large dimensions. Journal of Business & Economic Statistics, 40(4):1642–1664.
Trapani, L. (2018). A randomized sequential procedure to determine the number of factors. Journal of the American Statistical Association, 113(523):1341–1349.
Uematsu, Y. and Yamagata, T. (2023). Estimation of sparsity-induced weak factor models. Journal of Business & Economic Statistics, 41(1):213–227.
Vu, V. and Lei, J. (2012). Minimax rates of estimation for sparse pca in high dimensions. In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, volume 22 of Proceedings of Machine Learning Research, pages 1278–1286. PMLR.
Wang, D., Liu, X., and Chen, R. (2019). Factor models for matrix-valued high-dimensional time series. Journal of Econometrics, 208(1):231–248.
White, P. A. (1958). The computation of eigenvalues and eigenvectors of a matrix. Journal of the Society for Industrial and Applied Mathematics, 6(4):393–437.
Witten, D. M., Tibshirani, R., and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10(3):515–534.
Zhang, B., Pan, G., Yao, Q., and Zhou, W. (2023). Factor modeling for clustering high-dimensional time series. Journal of the American Statistical Association, 0(0):1–12.
Zou, H., Hastie, T., and Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2):265–286. Xiaoran Wu

Acknowledgments

We would like to thank the Co-Editor, Associate Editor and two anonymous

referees for their critical comments and thoughtful suggestions, which led to

a much improved version of this paper. This research was supported in part

by grants from NSFC, China (Nos. 12171427, U21A20426) and National

Key R&D Program of China (2024YFA1013502).

Supplementary Materials

The supplementary materials contains some technical lemmas, the proof of

Theorem 1-4 of the main article, and some detailed tables and figures of

simulation results which are discussed in the main paper.

Supplementary materials are available for download.

[1] Ahn, S. C. and Horenstein, A. R. (2013). Eigenvalue ratio test for the number of factors. Econometrica, 81(3):1203–1227.

[2] Ando, T. and Bai, J. (2016). Panel data models with grouped factor structure under unknown group membership. Journal of Applied Econometrics, 31(1):163–191.

[3] Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica, 71(1):135–171.

[4] Bai, J. (2009). Panel data models with interactive fixed effects. Econometrica, 77(4):1229–1279.

[5] Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70(1):191–221.

[6] Bai, J. and Ng, S. (2023). Approximate factor models with weaker loadings. Journal of Econometrics, 235(2):1893–1916.

[7] Baik, J. and Silverstein, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked population models. Journal of Multivariate Analysis, 97(6):1382–1408.

[8] Cai, T. T., Ma, Z., and Wu, Y. (2013). Sparse pca: Optimal rates and adaptive estimation. The Annals of Statistics, 41(6):3074–3110.

[9] Chang, J., Chen, C., Qiao, X., and Yao, Q. (2024). An autocovariance-based learning framework for high-dimensional functional time series. Journal of Econometrics, 239(2):105385.

[10] Chang, J., Guo, B., and Yao, Q. (2015). High dimensional stochastic regression with latent factors, endogeneity and nonlinearity. Journal of Econometrics, 189(2):297–312.

[11] Chang, J., Guo, B., and Yao, Q. (2018). Principal component analysis for second-order stationary vector time series. The Annals of Statistics, 46(5):2094–2124.

[12] Chang, J., He, J., Yang, L., and Yao, Q. (2023). Modelling matrix time series via a tensor cpdecomposition. Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(1):127– 148.

[13] Chudik, A., Pesaran, M. H., and Tosetti, E. (2011). Weak and strong cross section dependence and estimation of large panels. The Econometrics Journal, 14(1):C45–C90.

[14] Fan, J., Liao, Y., and Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(4):603–680.

[15] Hallin, M. and Liˇska, R. (2007). Determining the number of factors in the general dynamic factor model. Journal of the American Statistical Association, 102(478):603–617.

[16] Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association, 104(486):682–693.

[17] Jolliffe, I. T., Trendafilov, N. T., and Uddin, M. (2003). A modified principal component technique based on the lasso. Journal of computational and Graphical Statistics, 12(3):531–547.

[18] Lam, C. and Yao, Q. (2012). Factor modeling for high-dimensional time series: Inference for the number of factors. The Annals of Statistics, 40(2):694–726.

[19] Lam, C., Yao, Q., and Bathia, N. (2011). Estimation of latent factors for high-dimensional time series. Biometrika, 98(4):901–918.

[20] Ma, Z. (2013). Sparse principal component analysis and iterative thresholding. The Annals of Statistics, 41(2):772–801.

[21] Mackey, L. (2008). Deflation methods for sparse pca. In Advances in Neural Information Processing Systems, volume 21.

[22] Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach. The Annals of Statistics, 36(6):2791–2817.

[23] Onatski, A. (2009). Testing hypotheses about the number of factors in large factor models. Econometrica, 77(5):1447–1479.

[24] Onatski, A. (2010). Determining the number of factors from empirical distribution of eigenvalues. The Review of Economics and Statistics, 92(4):1004–1016.

[25] Pan, J. and Yao, Q. (2008). Modelling multiple time series via common factors. Biometrika, 95(2):365– 379.

[26] Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica, 17(4):1617–1642.

[27] Pelger, M. and Xiong, R. (2022). Interpretable sparse proximate factors for large dimensions. Journal of Business & Economic Statistics, 40(4):1642–1664.

[28] Trapani, L. (2018). A randomized sequential procedure to determine the number of factors. Journal of the American Statistical Association, 113(523):1341–1349.

[29] Uematsu, Y. and Yamagata, T. (2023). Estimation of sparsity-induced weak factor models. Journal of Business & Economic Statistics, 41(1):213–227.

[30] Vu, V. and Lei, J. (2012). Minimax rates of estimation for sparse pca in high dimensions. In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, volume 22 of Proceedings of Machine Learning Research, pages 1278–1286. PMLR.

[31] Wang, D., Liu, X., and Chen, R. (2019). Factor models for matrix-valued high-dimensional time series. Journal of Econometrics, 208(1):231–248.

[32] White, P. A. (1958). The computation of eigenvalues and eigenvectors of a matrix. Journal of the Society for Industrial and Applied Mathematics, 6(4):393–437.

[33] Witten, D. M., Tibshirani, R., and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10(3):515–534.

[34] Zhang, B., Pan, G., Yao, Q., and Zhou, W. (2023). Factor modeling for clustering high-dimensional time series. Journal of the American Statistical Association, 0(0):1–12.

[35] Zou, H., Hastie, T., and Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2):265–286. Xiaoran Wu