Abstract
Gaussian distributed sparsely sampled longitudinal data can be represented as Gaussian distributions of their
functional
principal
component
scores,
conditional
on
the
available
data.
Since
these
conditional
distribu-tions
reflect
the
entire
information
available
about
these
scores
and
therefore
about
the
unknown
trajectories
that
constitute the realizations of the stochastic process that generates the functional data, they are referred to as predictive distributions. This motivates a deeper investigation of the convergence of the predicted functional principal component scores
given
noisy
longitudinal
observations
towards
the
true
but
unobservable
scores
as
the
designs
transition from sparse (longitudinal) to dense (functional) and of the shrinkage of the predictive distributions towards a point mass located at the true score as the number of observations per subject increases.
Our study is
motivated by the theoretical and practically relevant challenge that point predictions in the sparse sampling regime
are not consistent for the true functional principal component scores.
Our proposal is to change the perspective
towards a focus on predictive distributions, which can be consistently estimated. The emphasis is thus shifted to uncertainty
quantification.
This
approach
is
also
demonstrated
for
the
case
of
sparsely
sampled
longitudinal
predictors in functional linear models where again one does not have consistent point predictors. Theoretical justification
is provided through the asymptotic rates of convergence for the 2-Wasserstein metric between true and estimated
predictive
distributions.
The
application
of
the
predictive
distribution
approach
for
functional
principal
component
analysis is illustrated for longitudinal data from the Baltimore Longitudinal Study of Aging.
Key words and phrases: Functional Data Analysis, Functional Principal Components, Functional Regression, Longitudinal Data, Sparse Design, Sparse-to-Dense, Uncertainty Quantification, Wasserstein Metric
Information
| Preprint No. | SS-2024-0253 |
|---|---|
| Manuscript ID | SS-2024-0253 |
| Complete Authors | Álvaro Gajardo, Xiongtao Dai, Hans-Georg Müller |
| Corresponding Authors | Xiongtao Dai |
| Emails | xiongtao.dai@hotmail.com |
References
- Amari, S.-i. and Matsuda, T. (2021). Wasserstein statistics in one-dimensional location scale models. Annals of the Institute of Statistical Mathematics.
- Balasubramanian, K., M¨uller, H.-G. and Sriperumbudur, B. K. (2025). Functional linear and single-index models: A unified approach via gaussian stein identity. Bernoulli 31, 973–1006.
- Cai, T. and Hall, P. (2006). Prediction in functional linear regression. Annals of Statistics 34, 2159–2179.
- Castro, P. E., Lawton, W. H. and Sylvestre, E. A. (1986). Principal modes of variation for processes with continuous sample curves. Technometrics 28, 329–337.
- Chiou, J.-M., Yang, Y.-F. and Chen, Y.-T. (2016). Multivariate functional linear regression and prediction. Journal of Multivariate Analysis 146, 301–312.
- Dai, X., M¨uller, H.-G. and Tao, W. (2018). Derivative principal component analysis for representing the time dynamics of longitudinal and functional data. Statistica Sinica 28, 1583–1609.
- Gajardo, A., Carroll, C., Chen, Y., Dai, X., Fan, J., Hadjipantelis, P. Z., Han, K., Ji, H., M¨uller,
- H.-G. and Wang, J.-L. (2021). fdapace: Functional Data Analysis and Empirical Dynamics. URL https://CRAN.R-project.org/package=fdapace. R package version 0.5.7.
- Gelbrich, M. (1990). On a formula for the L2 Wasserstein metric between measures on Euclidean and Hilbert spaces. Mathematische Nachrichten 147, 185–203.
- Hall, P. and Horowitz, J. L. (2007). Methodology and convergence rates for functional linear regression. Annals of Statistics 35, 70–91.
- Hall, P. and Hosseini-Nasab, M. (2006). On properties of functional principal components analysis. Journal of the Royal Statistical Society, Series B 68, 109–126.
- Horvath, L. and Kokoszka, P. (2012). Inference for Functional Data with Applications. New York: Springer.
- Hsing, T. and Eubank, R. (2015). Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. John Wiley & Sons.
- Kleffe, J. (1973). Principal components of random variables with values in a separable Hilbert space. Mathematische Operationsforschung und Statistik 4, 391–406.
- Kneip, A., Poss, D. and Sarda, P. (2016). Functional linear regression with points of impact. Annals of Statistics 44, 1–30.
- Kuo, H.-H. (1975). Gaussian Measures in Banach spaces. Springer.
- Li, Y. and Hsing, T. (2010). Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. Annals of Statistics 38, 3321–3351.
- Mardia, K., Kent, J. and Bibby, J. (1979). Multivariate Analysis. Academic Press.
- M¨uller, H.-G. (2005). Functional modelling and classification of longitudinal data. Scandinavian Journal of Statistics 32, 223–240.
- M¨uller, H.-G. and Yao, F. (2010). Empirical dynamics for longitudinal data. Annals of Statistics 38, 3458 – 3486.
- Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis. Springer Series in Statistics. New York: Springer. second edn.
- Rice, J. A. and Wu, C. O. (2001). Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics 57, 253–259.
- Shen, X. (2002). Asymptotic normality of semiparametric and nonparametric posterior distributions. Journal of the American Statistical Association 97, 222–235.
- Shi, J. Q. and Choi, T. (2011). Gaussian Process Regression Analysis for Functional Data. CRC Press.
- Shock, N. W., Greulich, R. C., Andres, R., Lakatta, E. G., Arenberg, D. and Tobin, J. D. (1984). Normal human aging: The Baltimore longitudinal study of aging. In NIH Publication No. 842450. Washington, D.C.: U.S. Government Printing Office.
- Villani, C. (2003). Topics in Optimal Transportation. American Mathematical Society.
- Wang, B. and Shi, J. Q. (2014). Generalized gaussian process regression model for non-gaussian functional data. Journal of the American Statistical Association 109, 1123–1133.
- Wang, J.-L., Chiou, J.-M. and M¨uller, H.-G. (2016). Functional data analysis. Annual Review of Statistics and Its Application 3, 257–295.
- Yao, F., M¨uller, H.-G. and Wang, J.-L. (2005a). Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association 100, 577–590.
- — (2005b). Functional linear regression analysis for longitudinal data. Annals of Statistics 33, 2873 – 2903.
- Zhang, X. and Wang, J.-L. (2016). From sparse to dense functional data and beyond. Annals of Statistics 44, 2281–2321.
- — (2018). Optimal weighting schemes for longitudinal and functional data. Statistics & Probability Letters 138, 165–170.
Acknowledgments
The research of XD has been supported by NSF grant DMS-2329879 and the research of HGM by
NSF grant DMS-2310450. We express our thanks to the reviewers for helpful comments that led
to numerous improvements.
Supplementary Materials
Additional simulation results, proofs, and auxiliary results are available in the Supplement.