Abstract

Gaussian distributed sparsely sampled longitudinal data can be represented as Gaussian distributions of their

functional

principal

component

scores,

conditional

on

the

available

data.

Since

these

conditional

distribu-tions

reflect

the

entire

information

available

about

these

scores

and

therefore

about

the

unknown

trajectories

that

constitute the realizations of the stochastic process that generates the functional data, they are referred to as predictive distributions. This motivates a deeper investigation of the convergence of the predicted functional principal component scores

given

noisy

longitudinal

observations

towards

the

true

but

unobservable

scores

as

the

designs

transition from sparse (longitudinal) to dense (functional) and of the shrinkage of the predictive distributions towards a point mass located at the true score as the number of observations per subject increases.

Our study is

motivated by the theoretical and practically relevant challenge that point predictions in the sparse sampling regime

are not consistent for the true functional principal component scores.

Our proposal is to change the perspective

towards a focus on predictive distributions, which can be consistently estimated. The emphasis is thus shifted to uncertainty

quantification.

This

approach

is

also

demonstrated

for

the

case

of

sparsely

sampled

longitudinal

predictors in functional linear models where again one does not have consistent point predictors. Theoretical justification

is provided through the asymptotic rates of convergence for the 2-Wasserstein metric between true and estimated

predictive

distributions.

The

application

of

the

predictive

distribution

approach

for

functional

principal

component

analysis is illustrated for longitudinal data from the Baltimore Longitudinal Study of Aging.

Key words and phrases: Functional Data Analysis, Functional Principal Components, Functional Regression, Longitudinal Data, Sparse Design, Sparse-to-Dense, Uncertainty Quantification, Wasserstein Metric

Information

Preprint No.SS-2024-0253
Manuscript IDSS-2024-0253
Complete AuthorsÁlvaro Gajardo, Xiongtao Dai, Hans-Georg Müller
Corresponding AuthorsXiongtao Dai
Emailsxiongtao.dai@hotmail.com

References

  1. Amari, S.-i. and Matsuda, T. (2021). Wasserstein statistics in one-dimensional location scale models. Annals of the Institute of Statistical Mathematics.
  2. Balasubramanian, K., M¨uller, H.-G. and Sriperumbudur, B. K. (2025). Functional linear and single-index models: A unified approach via gaussian stein identity. Bernoulli 31, 973–1006.
  3. Cai, T. and Hall, P. (2006). Prediction in functional linear regression. Annals of Statistics 34, 2159–2179.
  4. Castro, P. E., Lawton, W. H. and Sylvestre, E. A. (1986). Principal modes of variation for processes with continuous sample curves. Technometrics 28, 329–337.
  5. Chiou, J.-M., Yang, Y.-F. and Chen, Y.-T. (2016). Multivariate functional linear regression and prediction. Journal of Multivariate Analysis 146, 301–312.
  6. Dai, X., M¨uller, H.-G. and Tao, W. (2018). Derivative principal component analysis for representing the time dynamics of longitudinal and functional data. Statistica Sinica 28, 1583–1609.
  7. Gajardo, A., Carroll, C., Chen, Y., Dai, X., Fan, J., Hadjipantelis, P. Z., Han, K., Ji, H., M¨uller,
  8. H.-G. and Wang, J.-L. (2021). fdapace: Functional Data Analysis and Empirical Dynamics. URL https://CRAN.R-project.org/package=fdapace. R package version 0.5.7.
  9. Gelbrich, M. (1990). On a formula for the L2 Wasserstein metric between measures on Euclidean and Hilbert spaces. Mathematische Nachrichten 147, 185–203.
  10. Hall, P. and Horowitz, J. L. (2007). Methodology and convergence rates for functional linear regression. Annals of Statistics 35, 70–91.
  11. Hall, P. and Hosseini-Nasab, M. (2006). On properties of functional principal components analysis. Journal of the Royal Statistical Society, Series B 68, 109–126.
  12. Horvath, L. and Kokoszka, P. (2012). Inference for Functional Data with Applications. New York: Springer.
  13. Hsing, T. and Eubank, R. (2015). Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. John Wiley & Sons.
  14. Kleffe, J. (1973). Principal components of random variables with values in a separable Hilbert space. Mathematische Operationsforschung und Statistik 4, 391–406.
  15. Kneip, A., Poss, D. and Sarda, P. (2016). Functional linear regression with points of impact. Annals of Statistics 44, 1–30.
  16. Kuo, H.-H. (1975). Gaussian Measures in Banach spaces. Springer.
  17. Li, Y. and Hsing, T. (2010). Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. Annals of Statistics 38, 3321–3351.
  18. Mardia, K., Kent, J. and Bibby, J. (1979). Multivariate Analysis. Academic Press.
  19. M¨uller, H.-G. (2005). Functional modelling and classification of longitudinal data. Scandinavian Journal of Statistics 32, 223–240.
  20. M¨uller, H.-G. and Yao, F. (2010). Empirical dynamics for longitudinal data. Annals of Statistics 38, 3458 – 3486.
  21. Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis. Springer Series in Statistics. New York: Springer. second edn.
  22. Rice, J. A. and Wu, C. O. (2001). Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics 57, 253–259.
  23. Shen, X. (2002). Asymptotic normality of semiparametric and nonparametric posterior distributions. Journal of the American Statistical Association 97, 222–235.
  24. Shi, J. Q. and Choi, T. (2011). Gaussian Process Regression Analysis for Functional Data. CRC Press.
  25. Shock, N. W., Greulich, R. C., Andres, R., Lakatta, E. G., Arenberg, D. and Tobin, J. D. (1984). Normal human aging: The Baltimore longitudinal study of aging. In NIH Publication No. 842450. Washington, D.C.: U.S. Government Printing Office.
  26. Villani, C. (2003). Topics in Optimal Transportation. American Mathematical Society.
  27. Wang, B. and Shi, J. Q. (2014). Generalized gaussian process regression model for non-gaussian functional data. Journal of the American Statistical Association 109, 1123–1133.
  28. Wang, J.-L., Chiou, J.-M. and M¨uller, H.-G. (2016). Functional data analysis. Annual Review of Statistics and Its Application 3, 257–295.
  29. Yao, F., M¨uller, H.-G. and Wang, J.-L. (2005a). Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association 100, 577–590.
  30. — (2005b). Functional linear regression analysis for longitudinal data. Annals of Statistics 33, 2873 – 2903.
  31. Zhang, X. and Wang, J.-L. (2016). From sparse to dense functional data and beyond. Annals of Statistics 44, 2281–2321.
  32. — (2018). Optimal weighting schemes for longitudinal and functional data. Statistics & Probability Letters 138, 165–170.

Acknowledgments

The research of XD has been supported by NSF grant DMS-2329879 and the research of HGM by

NSF grant DMS-2310450. We express our thanks to the reviewers for helpful comments that led

to numerous improvements.

Supplementary Materials

Additional simulation results, proofs, and auxiliary results are available in the Supplement.


Supplementary materials are available for download.