Abstract
An asymptotic theory is established for linear functionals of the predictive function given by
kernel ridge regression, when the reproducing kernel Hilbert space is equivalent to a Sobolev space. The
theory covers a wide variety of linear functionals, including point evaluations, evaluation of derivatives,
L2 inner products, etc. We establish the upper and lower bounds of the estimates and their asymptotic
normality. We show the asymptotic normality of these estimators under mild conditions, which enables
uncertainty quantification of a wide range of frequently used plug-in estimators. The theory also implies
that the minimax L∞error of kernel ridge regression can be attained under λ ∼n−1 log n.
Information
| Preprint No. | SS-2024-0256 |
|---|---|
| Manuscript ID | SS-2024-0256 |
| Complete Authors | Rui Tuo, Lu Zou |
| Corresponding Authors | Rui Tuo |
| Emails | ruituo@tamu.edu |
References
- Adams, R. A. and Fournier, J. J. (2003). Sobolev Spaces, volume 140. Academic Press.
- Boser, B. E., Guyon, I. M., and Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory, pages 144–152.
- Brezis, H. and Mironescu, P. (2019). Where Sobolev interacts with Gagliardo–Nirenberg. Journal of Functional Analysis, 277(8):2839–2864.
- Cai, T. T. and Yuan, M. (2012). Minimax and adaptive prediction for functional linear regression. Journal of the American Statistical Association, 107(499):1201–1216.
- Calonico, S., Cattaneo, M. D., and Farrell, M. H. (2019). nprobust: Nonparametric kernel-based estimation and robust bias-corrected inference. Journal of Statistical Software, 91:1–33.
- Calonico, S., Cattaneo, M. D., and Farrell, M. H. (2022). Coverage error optimal confidence intervals for local polynomial regression. Bernoulli, 28(4):2998–3022.
- Caponnetto, A. and De Vito, E. (2007). Optimal rates for the regularized least-squares algorithm. Foundations of Computational Mathematics, 7:331–368.
- Cattaneo, M. D., Farrell, M. H., et al. (2020). lspartition: Partitioning-based least squares regression. R Journal, 12(1).
- Cheng, G. and Shang, Z. (2013). Joint asymptotics for semi-nonparametric models under penalization. arXiv preprint arXiv:1311.2628.
- Fischer, S. and Steinwart, I. (2020). Sobolev norm learning rates for regularized least-squares algorithms. The Journal of Machine Learning Research, 21(1):8464–8501.
- Huang, J. Z. (2003). Local asymptotics for polynomial spline regression. The Annals of Statistics, 31(5):1600–1635.
- Kosorok, M. R. (2008). Introduction to Empirical Processes and Semiparametric Inference. Springer.
- Liu, R., Li, K., and Li, M. (2023). Estimation and hypothesis testing of derivatives in smoothing spline anova models. arXiv preprint arXiv:2308.13905.
- Liu, Z. and Li, M. (2023). On the estimation of derivatives using plug-in kernel ridge regression estimators. Journal of Machine Learning Research, 24(266):1–37.
- Mammen, E. and van de Geer, S. (1997). Penalized quasi-likelihood estimation in partial linear models. The Annals of Statistics, 25(3):1014–1035.
- Mendelson, S. and Neeman, J. (2010). Regularization in kernel learning. Ann. Statist., 38(1):526–565.
- Messer, K. and Goldstein, L. (1993). A new class of kernels for nonparametric curve estimation. The Annals of Statistics, 21(1):179–195.
- Nemirovski, A. (2000). Topics in non-parametric statistics. Lecture Notes in Mathematics, 1738:86–282.
- Sch¨olkopf, B., Herbrich, R., and Smola, A. J. (2001). A generalized representer theorem. In International conference on computational learning theory, pages 416–426. Springer.
- Shang, Z. (2010). Convergence rate and bahadur type representation of general smoothing spline m-estimates. Electronic Journal of Statistics, 4:1411–1442.
- Shang, Z. and Cheng, G. (2013). Local and global asymptotic inference in smoothing spline models. The Annals of Statistics, 41(5):2608–2638.
- Silverman, B. (1984). Spline smoothing: The equivalent variable kernel method. Ann. Statist., 12(1):898–916.
- Singh, R. and Vijaykumar, S. (2023). Kernel ridge regression inference with applications to preference data. arXiv preprint arXiv:2302.06578v2.
- Smale, S. and Zhou, D.-X. (2007). Learning theory estimates via integral operators and their approximations. Constructive approximation, 26(2):153–172.
- Steinwart, I., Hush, D. R., Scovel, C., et al. (2009). Optimal rates for regularized least squares regression. In COLT, pages 79–93.
- Stone, C. J. (1980). Optimal rates of convergence for nonparametric estimators. The Annals of Statistics, 8(6):1348– 1360.
- Tuo, R. and Wu, C. F. J. (2015). Efficient calibration for imperfect computer models. The Annals of Statistics, 43(6):2331–2352.
- van de Geer, S. A. (2000). Empirical Processes in M-Estimation, volume 6. Cambridge University Press.
- Wahba, G. (1978). Improper priors, spline smoothing and the problem of guarding against model errors in regression. Journal of the Royal Statistical Society Series B: Statistical Methodology, 40(3):364–372.
- Wahba, G. (1990). Spline Models for Observational Data, volume 59. SIAM.
- Wendland, H. (2004). Scattered Data Approximation, volume 17. Cambridge University Press.
- Yuan, M. and Cai, T. T. (2010). A reproducing kernel Hilbert space approach to functional linear regression. The Annals of Statistics, 38(6):3412–3444.
- Zhao, S., Liu, R., and Shang, Z. (2021). Statistical inference on panel data models: A kernel ridge regression method. Journal of Business & Economic Statistics, 39(1):325–337. Rui Tuo
Supplementary Materials
The Supplementary Materials contain extra convergence results, details about the function
spaces, a discussion of a key assumption, all technical proofs, an extended literature review,
and additional figures from the numerical studies.