Variable Selection and Minimax Prediction in High-dimensional Functional Linear Models

Xingche Guo, Yehua Li and Tailen Hsing

doi:10.5705/ss.202025.0151

Abstract

High-dimensional functional data have become increasingly prevalent in

modern applications such as high-frequency financial data and neuroimaging data

analysis. We investigate a class of high-dimensional linear regression models, where

each predictor is a random element in an infinite-dimensional function space, and

the number of functional predictors p can potentially be ultra-high. Assuming that

each of the unknown coefficient functions belongs to some reproducing kernel Hilbert

space (RKHS), we regularize the fitting of the model by imposing a group elastic-net

type of penalty on the RKHS norms of the coefficient functions. We show that our

loss function is Gateaux sub-differentiable, and our functional elastic-net estimator

exists uniquely in the product RKHS. Under suitable sparsity assumptions and a

functional version of the irrepresentable condition, we derive a non-asymptotic tail

bound for variable selection consistency of our method.

Allowing the number of

true functional predictors q to diverge with the sample size, we also show a postselection refined estimator can achieve the oracle minimax optimal prediction rate.

The proposed methods are illustrated through simulation studies and a real-data

application from the Human Connectome Project.

Key words and phrases: Elastic-net penalty; Functional linear regression; Minimax optimality; Model selection consistency; Reproducing kernel Hilbert space; Sparsity

Information

Preprint No.	SS-2025-0151
Manuscript ID	SS-2025-0151
Complete Authors	Xingche Guo, Yehua Li, Tailen Hsing
Corresponding Authors	Yehua Li
Emails	yehuali@ucr.edu

References

Cai, T. T. and Hall, P. (2006). Prediction in functional linear regression. The Annals of Statistics, 34:2159–2179.
Cai, T. T. and Yuan, M. (2012). Minimax and adaptive prediction for functional linear regression. Journal of the American Statistical Association, 107:1201–1216.
Crambes, C., Kneip, A., and Sarda, P. (2009). Smoothing spline estimators for functional linear regression. The Annals of Statistics, 37:35–72.
Duncan, J., Seitz, R. J., Kolodny, J., Bor, D., Herzog, H., Ahmed, A., Newell,
F. N., and Emslie, H. (2000). A neural basis for general intelligence. Science, 289(5478):457–460.
Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96:1348–1360.
Fan, Y., James, G. M., and Radchenko, P. (2015). Functional additive regression. The Annals of Statistics, 43:2296–2325.
Finn, E. S., Shen, X., Scheinost, D., Rosenberg, M. D., Huang, J., Chun,
M. M., Papademetris, X., and Constable, R. T. (2015). Functional connectome fingerprinting: identifying individuals using patterns of brain connectivity. Nature Neuroscience, 18(11):1664–1671.
Friedman, J., Hastie, T., H¨ofling, H., and Tibshirani, R. (2007). Pathwise coordinate optimization. The Annals of Applied Statistics, 1(2):302–332.
Greene, A. S., Gao, S., Scheinost, D., and Constable, R. T. (2018). Taskinduced brain state manipulation improves prediction of individual traits. Nature Communications, 9(1):2807.
Hsing, T. and Eubank, R. (2015). Theoretical foundations of functional data analysis, with an introduction to linear operators, volume 997. John Wiley & Sons.
James, G. (2002). Generalized linear models with functional predictor variables. Journal of the Royal Statistical Society, Series B, 64:411–432.
Jia, J. and Yu, B. (2010). On model selection consistency of the elastic net when p ≫n. Statistica Sinica, 20:595–611.
Jung, R. E. and Haier, R. J. (2007). The parieto-frontal integration theory (p-fit) of intelligence: converging neuroimaging evidence. Behavioral and Brain Sciences, 30(2):135–154.
Lee, K.-Y., Ji, D., Li, L., Constable, T., and Zhao, H. (2023). Conditional functional graphical models. Journal of the American Statistical Association, 118(541):257–271.
Lei, J. (2014). Adaptive global testing for functional linear models. Journal of the American Statistical Association, 109:624–634.
Liu, Y., Li, Y., Carroll, R. J., and Wang, N. (2022). Predictive functional linear models with diverging number of semiparametric single-index interactions. Journal of Econometrics, 230(2):221–239.
Ma, P., Huang, J. Z., and Zhang, N. (2015). Efficient computation of smoothing splines via adaptive basis sampling. Biometrika, 102(3):631–645.
M¨uller, H. G. and Stadtm¨uller, U. (2005). Generalized functional linear models. The Annals of Statistics, 33:774–805.
Qiao, X., Guo, S., and James, G. M. (2019). Functional graphical models. Journal of the American Statistical Association, 114(525):211–222.
Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis.
Springer, New York, 2nd edition.
Ravikumar, P., Lafferty, J., Liu, H., and Wasserman, L. (2009). Sparse additive models. Journal of the Royal Statistical Society: Series B, 71(5):1009–1030.
Reiss, P. T. and Ogden, R. T. (2007). Functional principal component regression and functional partial least squares. Journal of the American Statistical Association, 102:984–996.
Ruppert, D., Wand, M. P., and Carroll, R. J. (2003). Semiparametric Regression. Cambridge University Press.
Shang, Z. and Cheng, G. (2015). Nonparametric inference in generalized functional linear models. The Annals of Statistics, 43:1742–1773.
Sun, X., Du, P., Wang, X., and Ma, P. (2018). Optimal penalized functionon-function regression under a reproducing kernel hilbert space. Journal of the American Statistical Association, 113:1601–1611.
Tibshirani, R. J. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58:267–288. Van Essen, D. C., Smith, S. M., Barch, D. M., Behrens, T. E., Yacoub, E.,
Ugurbil, K., Consortium, W.-M. H., et al. (2013). The wu-minn human connectome project: an overview. Neuroimage, 80:62–79.
Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia.
Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ1-constrained quadratic programming (lasso). IEEE Transactions on Information Theory, 55(5):2183–2202.
Wang, X., Zhu, H., and Initiative, A. D. N. (2017). Generalized scalar-onimage regression models via total variation. Journal of the American Statistical Association, 112(519):1156–1168.
Xu, D. and Wang, Y. (2021). Low-rank approximation for smoothing spline via eigensystem truncation. Stat, 10(1):e355.
Xue, K. and Yao, F. (2021). Hypothesis testing in large-scale functional linear regression. Statistica Sinica, 31:1101–1123.
Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B, 68(1):49–67.
Zapata, J., Oh, S. Y., and Petersen, A. (2021). Partial separability and functional graphical models for multivariate Gaussian processes. Biometrika, 109(3):665–681.
Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38:894–942.
Zhao, P. and Yu, B. (2006). On model selection consistency of lasso. Journal of Machine Learning Research, 7:2541–2563.
Zhou, H., Yao, F., and Zhang, H. (2023). Functional linear regression for discretely observed data: from ideal to reality. Biometrika, 110(2):381–393.
Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67(2):301–320.
Zou, H. and Zhang, H. H. (2009). On the adaptive elastic-net with a diverging number of parameters. The Annals of Statistics, 37:1733 – 1751.

Acknowledgments

The authors thank the editor, the associate editor, and two anonymous referees

for their many helpful and constructive comments, which led to significant

improvements to our paper.

Supplementary Materials

The online Supplementary Material contains technical proofs, substantiating

examples for the technical assumptions, and additional simulation results.

Supplementary materials are available for download.

[1] Cai, T. T. and Hall, P. (2006). Prediction in functional linear regression. The Annals of Statistics, 34:2159–2179.

[2] Cai, T. T. and Yuan, M. (2012). Minimax and adaptive prediction for functional linear regression. Journal of the American Statistical Association, 107:1201–1216.

[3] Crambes, C., Kneip, A., and Sarda, P. (2009). Smoothing spline estimators for functional linear regression. The Annals of Statistics, 37:35–72.

[4] Duncan, J., Seitz, R. J., Kolodny, J., Bor, D., Herzog, H., Ahmed, A., Newell,

[5] F. N., and Emslie, H. (2000). A neural basis for general intelligence. Science, 289(5478):457–460.

[6] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96:1348–1360.

[7] Fan, Y., James, G. M., and Radchenko, P. (2015). Functional additive regression. The Annals of Statistics, 43:2296–2325.

[8] Finn, E. S., Shen, X., Scheinost, D., Rosenberg, M. D., Huang, J., Chun,

[9] M. M., Papademetris, X., and Constable, R. T. (2015). Functional connectome fingerprinting: identifying individuals using patterns of brain connectivity. Nature Neuroscience, 18(11):1664–1671.

[10] Friedman, J., Hastie, T., H¨ofling, H., and Tibshirani, R. (2007). Pathwise coordinate optimization. The Annals of Applied Statistics, 1(2):302–332.

[11] Greene, A. S., Gao, S., Scheinost, D., and Constable, R. T. (2018). Taskinduced brain state manipulation improves prediction of individual traits. Nature Communications, 9(1):2807.

[12] Hsing, T. and Eubank, R. (2015). Theoretical foundations of functional data analysis, with an introduction to linear operators, volume 997. John Wiley & Sons.

[13] James, G. (2002). Generalized linear models with functional predictor variables. Journal of the Royal Statistical Society, Series B, 64:411–432.

[14] Jia, J. and Yu, B. (2010). On model selection consistency of the elastic net when p ≫n. Statistica Sinica, 20:595–611.

[15] Jung, R. E. and Haier, R. J. (2007). The parieto-frontal integration theory (p-fit) of intelligence: converging neuroimaging evidence. Behavioral and Brain Sciences, 30(2):135–154.

[16] Lee, K.-Y., Ji, D., Li, L., Constable, T., and Zhao, H. (2023). Conditional functional graphical models. Journal of the American Statistical Association, 118(541):257–271.

[17] Lei, J. (2014). Adaptive global testing for functional linear models. Journal of the American Statistical Association, 109:624–634.

[18] Liu, Y., Li, Y., Carroll, R. J., and Wang, N. (2022). Predictive functional linear models with diverging number of semiparametric single-index interactions. Journal of Econometrics, 230(2):221–239.

[19] Ma, P., Huang, J. Z., and Zhang, N. (2015). Efficient computation of smoothing splines via adaptive basis sampling. Biometrika, 102(3):631–645.

[20] M¨uller, H. G. and Stadtm¨uller, U. (2005). Generalized functional linear models. The Annals of Statistics, 33:774–805.

[21] Qiao, X., Guo, S., and James, G. M. (2019). Functional graphical models. Journal of the American Statistical Association, 114(525):211–222.

[22] Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis.

[23] Springer, New York, 2nd edition.

[24] Ravikumar, P., Lafferty, J., Liu, H., and Wasserman, L. (2009). Sparse additive models. Journal of the Royal Statistical Society: Series B, 71(5):1009–1030.

[25] Reiss, P. T. and Ogden, R. T. (2007). Functional principal component regression and functional partial least squares. Journal of the American Statistical Association, 102:984–996.

[26] Ruppert, D., Wand, M. P., and Carroll, R. J. (2003). Semiparametric Regression. Cambridge University Press.

[27] Shang, Z. and Cheng, G. (2015). Nonparametric inference in generalized functional linear models. The Annals of Statistics, 43:1742–1773.

[28] Sun, X., Du, P., Wang, X., and Ma, P. (2018). Optimal penalized functionon-function regression under a reproducing kernel hilbert space. Journal of the American Statistical Association, 113:1601–1611.

[29] Tibshirani, R. J. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58:267–288. Van Essen, D. C., Smith, S. M., Barch, D. M., Behrens, T. E., Yacoub, E.,

[30] Ugurbil, K., Consortium, W.-M. H., et al. (2013). The wu-minn human connectome project: an overview. Neuroimage, 80:62–79.

[31] Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia.

[32] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ1-constrained quadratic programming (lasso). IEEE Transactions on Information Theory, 55(5):2183–2202.

[33] Wang, X., Zhu, H., and Initiative, A. D. N. (2017). Generalized scalar-onimage regression models via total variation. Journal of the American Statistical Association, 112(519):1156–1168.

[34] Xu, D. and Wang, Y. (2021). Low-rank approximation for smoothing spline via eigensystem truncation. Stat, 10(1):e355.

[35] Xue, K. and Yao, F. (2021). Hypothesis testing in large-scale functional linear regression. Statistica Sinica, 31:1101–1123.

[36] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B, 68(1):49–67.

[37] Zapata, J., Oh, S. Y., and Petersen, A. (2021). Partial separability and functional graphical models for multivariate Gaussian processes. Biometrika, 109(3):665–681.

[38] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38:894–942.

[39] Zhao, P. and Yu, B. (2006). On model selection consistency of lasso. Journal of Machine Learning Research, 7:2541–2563.

[40] Zhou, H., Yao, F., and Zhang, H. (2023). Functional linear regression for discretely observed data: from ideal to reality. Biometrika, 110(2):381–393.

[41] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67(2):301–320.

[42] Zou, H. and Zhang, H. H. (2009). On the adaptive elastic-net with a diverging number of parameters. The Annals of Statistics, 37:1733 – 1751.