Abstract
As one of the most powerful tools for examining the association between functional
covariates and a response, the functional regression model has been widely adopted
in various interdisciplinary studies. Usually, a limited number of functional covariates are assumed in a functional linear regression model. Nevertheless, correlations
may exist between functional covariates in high-dimensional functional linear regression models, which brings significant statistical challenges to statistical inference and
functional variable selection. In this article, a novel functional factor augmentation
structure (fFAS) is introduced for multivariate functional series, and a multivariate
functional factor augmentation selection model (fFASM) is further proposed to deal
with issues arising from variable selection of correlated functional covariates. Theoretical justifications for the proposed fFAS are provided, and statistical inference
results of the proposed fFASM are established. Numerical investigations support the
superb performance of the novel fFASM model in terms of estimation accuracy and
selection consistency.
Key words and phrases: correlated functional covariates, functional factor augmentation structure, func- tional variable selection, factor augmentation regression
Information
| Preprint No. | SS-2025-0205 |
|---|---|
| Manuscript ID | SS-2025-0205 |
| Complete Authors | Hanteng Ma, Ziliang Shen, Xingdong Feng, Xin Liu |
| Corresponding Authors | Xin Liu |
| Emails | liu.xin@mail.shufe.edu.cn |
References
- Ahn, S. C. & Horenstein, A. R. (2013), ‘Eigenvalue ratio test for the number of factors’, Econometrica 81(3), 1203–1227.
- Akaike, H. (1974), ‘A new look at the statistical model identification’, IEEE transactions on automatic control 19(6), 716–723.
- Aneiros, G., Horov´a, I., Huˇskov´a, M. & Vieu, P. (2022), ‘On functional data analysis and related topics’, Journal of Multivariate Analysis 189, 104861.
- Aneiros, G., Novo, S. & Vieu, P. (2022), ‘Variable selection in functional regression models: A review’, Journal of Multivariate Analysis 188, 104871.
- Bai, J. (2003), ‘Inferential theory for factor models of large dimensions’, Econometrica 71(1), 135–171.
- Bai, J. & Ng, S. (2002), ‘Determining the number of factors in approximate factor models’, Econometrica 70(1), 191–221.
- Cardot, H., Ferraty, F. & Sarda, P. (2003), ‘Spline estimators for the functional linear model’, Statistica Sinica pp. 571–591.
- Castellanos, L., Vu, V. Q., Perel, S., Schwartz, A. B. & Kass, R. E. (2015), ‘A multivariate gaussian process factor model for hand shape during reach-to-grasp movements’, Statistica Sinica 25(1), 5.
- Centofanti, F., Lepore, A., Menafoglio, A., Palumbo, B. & Vantini, S. (2021), ‘Functional regression control chart’, Technometrics 63(3), 281–294.
- Chen, C., Guo, S. & Qiao, X. (2022), ‘Functional linear regression: dependence and error contamination’, Journal of Business & Economic Statistics 40(1), 444–457.
- Chen, D., Hall, P. & M¨uller, H.-G. (2011), ‘Single and multiple index functional regression models with nonparametric link’, The Annals of Statistics 39(3), 1720–1747.
- Chen, L., Wang, W. & Wu, W. B. (2021), ‘Dynamic semiparametric factor model with structural breaks’, Journal of Business & Economic Statistics 39(3), 757–771.
- Chiou, J.-M. & M¨uller, H.-G. (2016), ‘A pairwise interaction model for multivariate functional and longitudinal data’, Biometrika 103(2), 377–396.
- Cuevas, A. (2014), ‘A partial overview of the theory of statistics with functional data’, Journal of Statistical Planning and Inference 147, 1–23.
- Fan, J., Ke, Y. & Wang, K. (2020), ‘Factor-adjusted regularized model selection’, Journal of Econometrics 216(1), 71–85.
- Fan, J. & Li, R. (2001), ‘Variable selection via nonconcave penalized likelihood and its oracle properties’, Journal of the American Statistical Association 96(456), 1348–1360.
- Fan, J., Liao, Y. & Mincheva, M. (2013), ‘Large covariance estimation by thresholding principal orthogonal complements’, Journal of the Royal Statistical Society Series B: Statistical Methodology 75(4), 603–680.
- Fang, L., Zhao, H., Wang, P., Yu, M., Yan, J., Cheng, W. & Chen, P. (2015), ‘Feature selection method based on mutual information and class separability for dimension reduction in multidimensional time series for clinical data’, Biomedical Signal Processing and Control 21, 82–89.
- Gonzalez-Vidal, A., Jimenez, F. & Gomez-Skarmeta, A. F. (2019), ‘A methodology for energy multivariate time series forecasting in smart buildings based on feature selection’, Energy and Buildings 196, 71–82.
- Hall, P. & Horowitz, J. L. (2007), ‘Methodology and convergence rates for functional linear regression’, The Annals of Statistics 35(1), 70–91.
- Hays, S., Shen, H. & Huang, J. Z. (2012), ‘Functional dynamic factor models with application to yield curve forecasting’, The Annals of Applied Statistics pp. 870–894.
- H¨ormann, S., Kidzi´nski, L. & Hallin, M. (2015), ‘Dynamic functional principal components’, Journal of the Royal Statistical Society Series B: Statistical Methodology 77(2), 319–348.
- H¨ormann, S. & Kokoszka, P. (2010), ‘Weakly dependent functional data’, The Annals of Statistics 38(3), 1845–1884.
- Htun, H. H., Biehl, M. & Petkov, N. (2023), ‘Survey of feature selection and extraction techniques for stock market prediction’, Financial Innovation 9(1), 26. Jim´enez, F., Palma, J., S´anchez, G., Mar´ın, D., Francisco Palacios, M. & Luc´ıa L´opez,
- M. (2020), ‘Feature selection based multivariate time series forecasting: An application to antibiotic resistance outbreaks prediction’, Artificial Intelligence in Medicine 104, 101818.
- Kong, D., Xue, K., Yao, F. & Zhang, H. H. (2016), ‘Partially functional linear regression in high dimensions’, Biometrika 103(1), 147–159.
- Lee, J. D., Sun, Y. & Taylor, J. E. (2015), ‘On model selection consistency of regularized
- Li, Y. & Hsing, T. (2007), ‘On rates of convergence in functional linear regression’, Journal of Multivariate Analysis 98(9), 1782–1804.
- Lin, Z. & Wang, J.-L. (2022), ‘Mean and covariance estimation for functional snippets’, Journal of the American Statistical Association 117(537), 348–360. PMID: 35757778.
- Lin, Z. & Yao, F. (2020), ‘Functional regression on the manifold with contamination’, Biometrika 108(1), 167–181.
- Matsui, H. & Konishi, S. (2011), ‘Variable selection for functional regression models via the l1 regularization’, Computational Statistics & Data Analysis 55(12), 3304–3310.
- Merlev`ede, F., Peligrad, M. & Rio, E. (2011), ‘A bernstein type inequality and moderate deviations for weakly dependent sequences’, Probability Theory and Related Fields 151, 435–474.
- M¨uller, H.-G. & Yao, F. (2008), ‘Functional additive models’, Journal of the American Statistical Association 103(484), 1534–1544.
- Nti, K. O., Adekoya, A. & Weyori, B. (2019), ‘Random forest based feature selection of macroeconomic variables for stock market prediction’, American Journal of Applied Sciences 16(7), 200–212.
- Peng, R. D., Dominici, F., Pastor-Barriuso, R., Zeger, S. L. & Samet, J. M. (2005), ‘Seasonal analyses of air pollution and mortality in 100 us cities’, American Journal of Epidemiology 161(6), 585–594.
- Petersen, A. (2024), ‘Mean and covariance estimation for discretely observed highdimensional functional data: Rates of convergence and division of observational regimes’,
- Ramsay, J. & Silverman, B. (2005), Functional Data Analysis, Springer Series in Statistics, Springer.
- Schwarz, G. (1978), ‘Estimating the dimension of a model’, The annals of statistics 6(2), 461–464.
- Tibshirani, R. (1996), ‘Regression shrinkage and selection via the lasso’, Journal of the Royal Statistical Society Series B: Statistical Methodology 58(1), 267–288.
- Yang, S. & Ling, N. (2024), ‘Robust estimation of functional factor models with functional pairwise spatial signs’, Computational Statistics pp. 1–24.
- Yao, F., M¨uller, H.-G. & Wang, J.-L. (2005), ‘Functional linear regression analysis for longitudinal data’, The Annals of Statistics 33(6), 2873–2903.
- Yuan, M. & Cai, T. T. (2010), ‘A reproducing kernel Hilbert space approach to functional linear regression’, The Annals of Statistics 38(6), 3412–3444.
- Zhang, C.-H. (2010), ‘Nearly unbiased variable selection under minimax concave penalty’, Annals of Statistics 38(2), 894–942.
- Zhenhua Lin, M. E. L. & M¨uller, H.-G. (2023), ‘High-dimensional manova via bootstrapping and its application to functional and sparse count data’, Journal of the American Statistical Association 118(541), 177–191.
Supplementary Materials
. It is found that the proposed fFASM method shows significantly
better performance than Lasso and grLasso, achieving the highest average out-of-sample
while maintaining the lowest average model size. Conversely, the joint MFPCA approach
yields a negative out-of-sample R2, indicating that fusing all 60 functional covariates into
a global dynamic system causes severe overfitting on this dataset, thereby highlighting the
necessity of effective variable selection from the proposed method.
2https://data.stats.gov.cn/
In Table 2 of the Supplementary Material, all three variable-selection methods identify
“Cement” as the most frequently selected functional covariate out of 200 repetitions, which
is expected as it is the most critical raw material required for house construction. Similarly,
the functional covariate “Aluminum Materials” is selected among the top five. Specifically,
the fFASM and Lasso methods accurately select the functional covariates “Medium Tractors” and “Engine”, which are closely related to the construction machinery used in the
house building industry. In contrast, grLasso selects variables such as “Mechanized Paper”,
“Hydropower Generation”, and “Phosphate Rock”, which intuitively have limited direct
connections with house construction. This structural over-penalization may explain why
grLasso yields a larger model size (2.61) but a noticeably inferior average out-of-sample
prediction performance (R2 = 0.420).
Method
Average out-of-sample R2
Average model size
fFASM (Proposed)
0.647
1.56
Lasso
0.583
2.46
grLasso
0.420
2.61