Abstract

Traditional clustering methods are mainly developed for multivariate data and often struggle to

capture the continuity and dynamics of functional data, thus neglecting key features. To address

this issue, we propose a two-stage functional data spectral clustering (TSFDSC) method that can

effectively cluster both dense and sparse functional data using the graph Laplacian operator. For

dense functional data, they are projected onto a set of pre-determined basis functions, and then the

resulting coefficients are clustered by spectral clustering, whereas for sparse functional data, functional principal component scores are used instead. By leveraging low-dimensional representations

from spectral embedding, TSFDSC achieves computational efficiency. We establish asymptotic

properties for the proposed procedure. Simulation studies demonstrate that TSFDSC outperforms

some commonly used functional clustering methods under various settings. The effectiveness of

TSFDSC is further illustrated by two real applications.

Information

Preprint No.SS-2025-0153
Manuscript IDSS-2025-0153
Complete AuthorsYang Ren, Jinguan Lin, Peijun Sang, Jiangyan Wang
Corresponding AuthorsJiangyan Wang
Emailswangjiangyan2007@126.com

References

  1. Abraham, C., P. A. Cornillon, E. Matzner-Løber, and N. Molinari (2003). Unsupervised curve clustering using B-splines. Scandinavian Journal of Statistics 30(3), 581–595.
  2. Aggarwal, C. C. and C. K. Reddy (2018). Data Clustering. Chapman and Hall/CRC, New York.
  3. Al Alawi, M. (2021). Spectral clustering and downsampling-based model selection for functional data. Ph. D. thesis, University of Glasgow.
  4. Bogin, B. (1999). Evolutionary perspective on human growth. Annual Review of Anthropology 28(1), 109–153.
  5. Cao, G., L. Wang, Y. Li, and L. Yang (2016). Oracle-efficient confidence envelopes for covariance functions in dense functional data. Statistica Sinica 26(1), 359–383.
  6. Chen, F., G. Cheung, and X. Zhang (2024). Manifold graph signal restoration using gradient graph Laplacian regularizer. IEEE Transactions on Signal Processing 72, 744–761.
  7. Chiou, J.-M. and P.-L. Li (2007). Functional clustering and identifying substructures of longitudinal data. Journal of the Royal Statistical Society Series B: Statistical Methodology 69(4), 679–699.
  8. Chiou, J.-M. and P.-L. Li (2008). Correlation-based functional clustering via subspace projection. Journal of the American Statistical Association 103(484), 1684–1692.
  9. De Boor, C. (1978). A Practical Guide to Splines. Springer, New York.
  10. Delaigle, A., P. Hall, and T. Pham (2019). Clustering functional data into groups by using projections. Journal of the Royal Statistical Society Series B: Statistical Methodology 81(2), 271–304.
  11. Fan, J. and I. Gijbels (1996). Local Polynomial Modelling and Its Applications. Routledge, New York.
  12. Floriello, D. and V. Vitelli (2017). Sparse clustering of functional data. Journal of Multivariate Analysis 154, 1–18.
  13. Garcia-Escudero, L. A. and A. Gordaliza (2005). A proposal for robust curve clustering. Journal of Classification 22(2), 185–201.
  14. Grone, R., R. Merris, and V. S. Sunder (1990). The Laplacian spectrum of a graph. SIAM Journal on Matrix Analysis and Applications 11(2), 218–238.
  15. Hennig, C. (2007). Cluster-wise assessment of cluster stability. Computational Statistics & Data Analysis 52(1), 258–271.
  16. Huang, W. and D. Ruppert (2023). Copula-based functional bayes classification with principal components and partial least squares. Statistica Sinica 33(1), 55–84.
  17. Ieva, F., A. M. Paganoni, D. Pigoli, and V. Vitelli (2013). Multivariate functional clustering for the morphological analysis of electrocardiograph curves. Journal of the Royal Statistical Society: Series C (Applied Statistics) 62(3), 401–418.
  18. Jaccard, P. (1901). Distribution de la flore alpine dans le bassin des dranses et dans quelques régions voisines. Bulletin de la Société Vaudoise des Sciences Naturelles 37, 241–272.
  19. Jacques, J. and C. Preda (2014). Functional data clustering: a survey. Advances in Data Analysis and Classification 8, 231–255.
  20. James, G. M. and C. A. Sugar (2003). Clustering for sparsely sampled functional data. Journal of the American Statistical Association 98(462), 397–408.
  21. Kachouie, N. N. and M. Shutaywi (2020). Weighted mutual information for aggregated kernel clustering. Entropy 22(3), 351.
  22. Kronenberg, H. M. (2007). Williams Textbook of Endocrinology E-Book. Elsevier Health Sciences.
  23. Lange, T., V. Roth, M. L. Braun, and J. M. Buhmann (2004). Stability-based validation of clustering solutions. Neural Computation 16(6), 1299–1323.
  24. Largo, R., T. Gasser, A. Prader, W. Stuetzle, and P. Huber (1978). Analysis of the adolescent growth spurt using smoothing spline functions. Annals of Human Biology 5(5), 421–434.
  25. Li, J. and L. Yang (2023). Statistical inference for functional time series. Statistica Sinica 33(1), 519–549.
  26. Li, Y. and T. Hsing (2010). Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. The Annals of Statistics 38(6), 3321–3351.
  27. Liang, D., H. Huang, Y. Guan, and F. Yao (2023). Test of weak separability for spatially stationary functional field. Journal of the American Statistical Association 118(543), 1606–1619.
  28. Ma, S. (2014). A plug-in the number of knots selector for polynomial spline regression. Journal of Nonparametric Statistics 26(3), 489–507.
  29. Ma, T., F. Yao, and Z. Zhou (2024). Network-level traffic flow prediction: Functional time series vs. functional neural network approach. The Annals of Applied Statistics 18(1), 424–444.
  30. Monti, S., P. Tamayo, J. Mesirov, and T. Golub (2003). Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning 52, 91–118.
  31. Ramsay, J. (2024). fda: Functional Data Analysis. R package version 6.1.9.
  32. Ramsay, J. and B. Silverman (2005). Functional Data Analysis. Springer, New York.
  33. Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing 17, 395–416.
  34. Von Luxburg, U., M. Belkin, and O. Bousquet (2008). Consistency of spectral clustering. Annals of Statistics 36(2), 555–586.
  35. Wang, J., G. Cao, L. Wang, and L. Yang (2020). Simultaneous confidence band for stationary covariance function of dense functional data. Journal of Multivariate Analysis 176, 104584.
  36. Wang, J., L. Gu, and L. Yang (2022). Oracle-efficient estimation for functional data error distribution with simultaneous confidence band. Computational Statistics & Data Analysis 167, 107363.
  37. Wang, J. L., J. M. Chiou, and H. G. Müller (2016). Functional data analysis. Annual Review of Statistics and Its Application 3(1), 257–295.
  38. Xue, K., J. Yang, and F. Yao (2024). Optimal linear discriminant analysis for high-dimensional functional data. Journal of the American Statistical Association 119(546), 1055–1064.
  39. Yao, F., H. G. Müller, and J. L. Wang (2005). Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association 100(470), 577–590.
  40. Zhang, M. and A. Parnell (2023). Review of clustering methods for functional data. ACM Transactions on Knowledge Discovery from Data 17(7), 1–34.
  41. Zhang, X. and J.-L. Wang (2016). From sparse to dense functional data and beyond. The Annals of Statistics 44(5), 2281–2321.
  42. Zhou, H., F. Yao, and H. Zhang (2023). Functional linear regression for discretely observed data: from ideal to reality. Biometrika 110(2), 381–393.

Acknowledgments

This work was supported by the National Natural Science Foundation of China

(12271255, 12371267), Qing Lan Project of Jiangsu Province, Postgraduate Research and

Practice Innovation Program of Jiangsu Province (KYCX25_2447), the National Statistical Science Research Project of China (2024LY059), project of Joint Lab for Statistics

and Finance, and Jiangsu Provincial Key Discipline Construction Project (Statistics).

Supplementary Materials

The supplementary file contains the proofs of Theorems 2.1-2.2 and more extensive

numerical results.


Supplementary materials are available for download.