Abstract
Nonlinear structured latent factor model captures the relationship between ob
served variables and latent variables in a nonlinear way, offering greater flexibility compared
to a linear factor model. Functional data characterizes features of data that vary continuously over time or space and is widely applied across various fields. This paper proposes a
nonlinear structured latent factor model for functional data. We consider correlations for
the latent factor to account for the dependence in functional data at different time points.
The structured identifiability of latent factors is studied to ensure uniqueness, thereby allowing these factors to have a physical interpretation. A Gaussian process (GP) prior is
utilized to estimate the unknown nonlinear link functions. To improve computational efficiency, an efficient algorithm is developed by using the nearest neighbor Gaussian process
(NNGP). The consistency of the latent factors and the unknown parameters, as well as the
posterior consistency of the unknown link functions, was established.
Simulation studies
were conducted to demonstrate the finite-sample performance and the flexibility of the proposed model, and the significant computational time savings achieved by NNGP compared
to GP. The method was applied to analyse the gait data collected in our laboratory for early
detection of neurodegenerative diseases.
Xiaorui Wang and Yimang Zhang contributed equally.
Information
| Preprint No. | SS-2025-0148 |
|---|---|
| Manuscript ID | SS-2025-0148 |
| Complete Authors | Xiaorui Wang, Yimang Zhang, Jian Qing Shi |
| Corresponding Authors | Jian Qing Shi |
| Emails | shijq@sustech.edu.cn |
References
- Bansal, A. R. and Dimri, V. (2021). Power spectral density. Encyclopedia of Mathematical Geosciences, 1–3. Springer.
- Basawa, I. and Rao, B. (1980). Statistical Inference for Stochastic Processes. Academic Press, London.
- Buckley, C., Alcock, L., McArdle, R., Rehman, R. Z. U., Del Din, S., Mazza‘, C., Yarnall, A. J. and
- Rochester, L. (2019). The role of movement analysis in diagnosing and monitoring neurodegenerative conditions: Insights from gait and postural control. Brain Sciences 9(2), 34.
- Chen, Y., Li, X. and Zhang, S. (2020). Structured latent factor analysis for large-scale data: Identifiability, estimability, and their implications. Journal of the American Statistical Association 115(532), 1756– 1770.
- Choi, T. and Schervish, M. J. (2007). On posterior consistency in nonparametric regression problems. Journal of Multivariate Analysis 98(10), 1969–1987.
- Choi, T., Shi, J. Q. and Wang, B. (2011). A gaussian process regression approach to a single-index model. Journal of Nonparametric Statistics 23(1), 21–36.
- Coube-Sisqueille, S. and Liquet, B. (2022). Improving performances of MCMC for nearest neighbor gaussian process models with full data augmentation. Computational Statistics & Data Analysis 168, 107368.
- Damianou, A., Titsias, M. and Lawrence, N. (2011). Variational gaussian process dynamical systems. Advances in Neural Information Processing Systems 24.
- Datta, A., Banerjee, S., Finley, A. O. and Gelfand, A. E. (2016). Hierarchical nearest-neighbor gaussian process models for large geostatistical datasets. Journal of the American Statistical Association 111(514), 800–812.
- Fang, G., Guo, J., Xu, X., Ying, Z. and Zhang, S. (2021). Identifiability of bifactor models. Statistica Sinica, 31, 2309-2330.
- Finley, A. O., Datta, A., Cook, B. D., Morton, D. C., Andersen, H. E. and Banerjee, S. (2019). Efficient algorithms for bayesian nearest neighbor gaussian processes. Journal of Computational and Graphical Statistics 28(2), 401–414.
- Gramacy, R. B. (2016). lagp: large-scale spatial modeling via local approximate gaussian processes in R. Journal of Statistical Software 72, 1–46.
- Gramacy, R. B. and Lee, H. K. H. (2008). Bayesian treed gaussian process models with an application to computer modeling. Journal of the American Statistical Association 103(483), 1119–1130.
- Gu, M., Lin, Y., Lee, V. C. and Qiu, D. Y. (2024). Probabilistic forecast of nonlinear dynamical systems with uncertainty quantification. Physica D: Nonlinear Phenomena 457, 133938.
- Gu, M. and Shen, W. (2020). Generalized probabilistic principal component analysis of correlated data. Journal of Machine Learning Research 21(13), 1–41.
- Guinness, J. (2018). Permutation and grouping methods for sharpening gaussian process approximations. Technometrics 60(4), 415–429.
- Hoehn, M. M. and Yahr, M. D. (1998). Parkinsonism: onset, progression, and mortality. Neurology 50(2), 318–318.
- Lawrence, N. (2005). Probabilistic non-linear principal component analysis with Gaussian process latent variable models. Journal of Machine Learning Research 6, 1783–1816.
- Leeb, W. (2021). A note on identifiability conditions in confirmatory factor analysis. Statistics & Probability Letters 178 109190
- Lin, Y., Liu, X., Segall, P. and Gu, M. (2025). Fast data inversion for high-dimensional dynamical systems from noisy measurements. arXiv preprint arXiv:2501.01324.
- Liu, H., Cai, J., Wang, Y. and Ong, Y. S. (2018a). Generalized robust bayesian committee machine for large-scale gaussian process regression. International Conference on Machine Learning (ICML), 3131– 3140.
- Liu, H., Ong, Y. S., Shen, X. and Cai, J. (2018b). When gaussian process meets big data: A review of scalable GPs. IEEE transactions on neural networks and learning systems 31(11), 4405–4423.
- McCabe, G. P. (1984). Principal variables. Technometrics 26(2), 137–144.
- Morris, R., Hickey, A., Del Din, S., Godfrey, A., Lord, S. and Rochester, L. (2017). A model of free-living gait: A factor analysis in parkinson’s disease. Gait & Posture 52, 68–71.
- Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. The MIT Press,
- Cambridge, MA.
- Saha, A., Datta, A. and Banerjee, S. (2022). Scalable predictions for spatial probit linear mixed models using nearest neighbor gaussian processes. Journal of Data Science 20(4), 533.
- Shi, J. Q. and Choi, T. (2011). Gaussian Process Regression Analysis for Functional Data. Chapman & Hall/CRC, London.
- Spearman, C. (1904). “General Intelligence” Objectively determined and measured. The American Journal of Psychology 15, 201–292.
- Tu, J. H., Rowley, C. W., Luchtenburg, D. M., Brunton, S. L. and Kutz, J. N. (2014). On dynamic mode decomposition: Theory and applications. Journal of Computational Dynamics 1(2), 391–421.
- Villarraga, D. F. and Daziano, R. A. (2025). Hierarchical nearest neighbor gaussian process models for discrete choice: Mode choice in new york city. Transportation Research Part B: Methodological 191, 103132
- Wang, B. and Shi, J. Q. (2014). Generalized gaussian process regression model for non-gaussian functional data. Journal of the American Statistical Association 109(507), 1123–1133.
- Wang, J., Hertzmann, A. and Fleet, D. J. (2005). Gaussian process dynamical models. Advances in Neural Information Processing Systems 18.
- Wang, Z., Noh, M., Lee, Y. and Shi, J. Q. (2021). A general robust t-process regression model. Computational Statistics & Data Analysis 154, 107093.
- Wang, Z., Shi, J. Q. and Lee, Y. (2017). Extended t-process regression models. Journal of Statistical Planning and Inference 189, 38–60.
- Wu, L., Pleiss, G. and Cunningham, J. P. (2022). Variational nearest neighbor gaussian process. International Conference on Machine Learning, 24114–24130.
- Zhang, Y. (2025). Advancing latent factor analysis: Bayesian approaches for nonlinear and functional models. PhD thesis, Southern University of Science and Technology.
- Zhang, Y., Wang, X. and Shi, J. Q. (2025). Bayesian analysis of nonlinear structured latent factor models using a gaussian process prior. Journal of Multivariate Analysis, In Press.
Acknowledgments
The authors would like to thank two anonymous reviewers, an associate editor and
the editor for constructive comments and helpful suggestions. This publication is supported by funds of the National Key R & D Program of China (2023YFA1011400),
National Natural Science Foundation of China (No.12271239), Shenzhen Fundamental Research Program JCYJ20220818100602005 (No.20220111). Wang’s work
is also supported by Postdoctoral Fellowship Program of CPSF under Grant Number GZB20240292 and China Postdoctoral Science Foundation (No. 2024T170376).
Supplementary Materials
The online Supplementary Materials include all the technical proofs.