Abstract
Motivated by imaging genetics, this paper introduces a semi-nonparametric
varying coefficients modeling framework to reveal varying associations between
genetic markers and imaging responses.
We aim to conduct a comprehensive
theoretical analysis of estimation and inference procedures applicable to these
models. By employing the kernel machine method, we estimate unknown varying coefficient functions and derive their representer theorem. We also establish
the theoretical properties of these estimated functions, including their rate of convergence, Bahadur representation, point-wise limit distributions, and confidence
intervals. Additionally, we propose test statistics under a linear mixed effects
model framework to assess the significance of all varying coefficients, taking into
account within-subject dependence. The efficacy of our proposed methodology
is demonstrated through simulation studies and an application to data from the
Alzheimer’s Disease Neuroimaging Initiative study.
Information
| Preprint No. | SS-2024-0118 |
|---|---|
| Manuscript ID | SS-2024-0118 |
| Complete Authors | Ting Li, Yang Yu, Xiao Wang, J.S. Marron, Hongtu Zhu |
| Corresponding Authors | Hongtu Zhu |
| Emails | htzhu@email.unc.edu |
References
- Berisa, T. and J. K. Pickrell (2016). Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32(2), 283–285.
- Broyden, C. G. (1970). The convergence of a class of double-rank minimization algorithms 2. the new algorithm. Journal of Applied Mathematics 6(3), 222–231.
- Cai, T. T. and M. Yuan (2011). Optimal estimation of the mean function based on discretely sampled functional data: Phase transition. The Annals of Statistics 39(5), 2330–2355.
- Cheng, G. and Z. Shang (2015). Joint asymptotics for semi-nonparametric regression models with partially linear structure. The Annals of Statistics 43(3), 1351–1390.
- Dubois, B., H. Hampel, H. H. Feldman, P. Scheltens, P. Aisen, S. Andrieu, H. Bakardjian, H. Benali,
- L. Bertram, and K. Blennow (2016). Preclinical Alzheimer’s disease: definition, natural history, and diagnostic criteria. Alzheimer’s & Dementia 12(3), 292–323.
- Elliott, L. T., K. Sharp, F. Alfaro-Almagro, S. Shi, K. L. Miller, G. Douaud, J. Marchini, and S. M.
- Smith (2018). Genome-wide association studies of brain imaging phenotypes in UK Biobank. Nature 562(7726), 210–216.
- Goodlett, C. B., P. T. Fletcher, J. H. Gilmore, and G. Gerig (2009). Group analysis of DTI fiber tract statistics with application to neurodevelopment. Neuroimage 45(1), S133–S142.
- Gu, C. (2013). Smoothing spline ANOVA models. Springer Science & Business Media.
- Gu, C. and G. Wahba (1993). Smoothing spline ANOVA with component-wise bayesian “confidence intervals”. Journal of Computational and Graphical Statistics 2(1), 97–117.
- Guerreiro, R. and J. Bras (2015). The age factor in Alzheimer’s disease. Genome medicine 7(1), 1–3.
- Hofmann, T., B. Sch¨olkopf, and A. J. Smola (2008). Kernel methods in machine learning. The Annals of Statistics 36(3), 1171–1220.
- Kim, J., J. M. Basak, and D. M. Holtzman (2009). The role of apolipoprotein E in Alzheimer’s disease. Neuron 63(3), 287–303.
- Kong, D., J. G. Ibrahim, E. Lee, and H. Zhu (2018). Flcrm: Functional linear cox regression model. Biometrics 74(1), 109–117.
- Krafty, R. T., P. A. Gimotty, D. Holtz, G. Coukos, and W. Guo (2008). Varying coefficient model with unknown within-subject covariance for analysis of tumor growth curves. Biometrics 64(4), 1023–1031.
- Kwee, L. C., D. Liu, X. Lin, D. Ghosh, and M. P. Epstein (2008). A powerful and flexible multilocus association test for quantitative traits. The American Journal of Human Genetics 82(2), 386–397.
- Le, B. D. and J. L. Stein (2019). Mapping causal pathways from genetics to neuropsychiatric disorders using genome-wide imaging genetics: Current status and future directions. Psychiatry and Clinical Neurosciences 73(7), 357–369.
- Li, T., Y. Yu, J. Marron, and H. Zhu (2024). A partially functional linear regression framework for integrating genetic, imaging, and clinical data. The Annals of Applied Statistics 18(1), 704–728.
- Li, X., L. Wang, and H. J. Wang (2021). Sparse learning and structure identification for ultrahighdimensional image-on-scalar regression. Journal of the American Statistical Association 116(536), 1994–2008.
- Li, Y., H. Zhu, D. Shen, W. Lin, J. H. Gilmore, and J. G. Ibrahim (2011). Multiscale adaptive regression models for neuroimaging data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73(4), 559–578.
- Lindquist, M. A., J. M. Loh, and Y. R. Yue (2010). Adaptive spatial smoothing of fMRI images. Statistics and its Interface 3(1), 3–13.
- Liu, D., X. Lin, and D. Ghosh (2007). Semiparametric regression of multidimensional genetic pathway data: Least-squares kernel machines and linear mixed models. Biometrics 63(4), 1079–1088.
- Lupton, M. K., D. Stahl, N. Archer, C. Foy, M. Poppe, S. Lovestone, P. Hollingworth, J. Williams,
- M. J. Owen, K. Dowzell, et al. (2010). Education, occupation and retirement age effects on the age of onset of alzheimer’s disease. International journal of geriatric psychiatry 25(1), 30–36.
- Miller, K. L., F. Alfaro-Almagro, N. K. Bangerter, D. L. Thomas, E. Yacoub, J. Xu, A. J. Bartsch,
- S. Jbabdi, S. N. Sotiropoulos, and J. L. Andersson (2016). Multimodal population brain imaging in the uk biobank prospective epidemiological study. Nature Neuroscience 19(11), 1523–1536.
- Morris, J. S. (2015). Functional regression. Annual Review of Statistics and Its Application 2, 321–359.
- Mueller, S. G., M. W. Weiner, L. J. Thal, R. C. Petersen, C. Jack, W. Jagust, J. Q. Trojanowski,
- A. W. Toga, and L. Beckett (2005). The Alzheimer’s disease neuroimaging initiative. Neuroimaging Clinics 15(4), 869–877.
- Nathoo, F., L. Kong, and H. Zhu (2019). A review of statistical methods in imaging genetics. The Canadian Journal of Statistics 47(1), 108–131.
- Nebel, R. A., N. T. Aggarwal, L. L. Barnes, A. Gallagher, J. M. Goldstein, K. Kantarci, M. P. Mallampalli, E. C. Mormino, L. Scott, W. H. Yu, et al. (2018). Understanding the impact of sex and gender in alzheimer’s disease: a call to action. Alzheimer’s & Dementia 14(9), 1171–1183.
- Pedraza, O., D. Bowers, and R. Gilmore (2004). Asymmetry of the hippocampus and amygdala in mri volumetric measurements of normal adults. Journal of the International Neuropsychological Society 10(5), 664–678.
- Petersen, A. and H.-G. M¨uller (2016). Functional data analysis for density functions by transformation to a hilbert space. The Annals of Statistics 44(1), 183–218.
- Poggio, T. and F. Girosi (1990). Networks for approximation and learning. Proceedings of the IEEE 78(9), 1481–1497.
- Price, A. L., N. J. Patterson, R. M. Plenge, M. E. Weinblatt, N. A. Shadick, and D. Reich (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nature genetics 38(8), 904–909.
- Rao, Y. L., B. Ganaraja, B. Murlimanju, T. Joy, A. Krishnamurthy, and A. Agrawal (2022). Hippocampus and its involvement in alzheimer’s disease: a review. 3 Biotech 12(2), 55.
- Reiss, P. T., L. Huang, and M. Mennes (2010). Fast function-on-scalar regression with penalized basis expansions. The International Journal of Biostatistics 6(1), Article 28.
- Sch¨olkopf, B. and A. J. Smola (2002). Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press.
- Sha, Z., D. Schijven, A. Carrion-Castillo, M. Joliot, B. Mazoyer, S. E. Fisher, F. Crivello, and C. Francks
- (2021). The genetic architecture of structural left–right asymmetry of the human brain. Nature Human Behaviour 5(9), 1226–1239.
- Shang, Z. and G. Cheng (2013). Local and global asymptotic inference in smoothing spline models. The Annals of Statistics 41(5), 2608–2638.
- Sollis, E., A. Mosaku, A. Abid, A. Buniello, M. Cerezo, L. Gil, T. Groza, O. G¨une¸s, P. Hall, J. Hayhurst,
- et al. (2023). The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Research 51(D1), D977–D985.
- Stein, J. L., S. E. Medland, A. A. Vasquez, D. P. Hibar, R. E. Senstad, A. M. Winkler, R. Toro, K. Appel,
- R. Bartecek, Ø. Bergmann, et al. (2012). Identification of common variants associated with human hippocampal and intracranial volumes. Nature genetics 44(5), 552–561.
- Stone, C. J. (1994). The use of polynomial splines and their tensor products in multivariate function estimation. The annals of statistics 22(1), 118–171.
- Wahba, G. (1990). Spline models for observational data. Society for Industrial and Applied Mathematics.
- Wall, J. D. and J. K. Pritchard (2003). Haplotype blocks and linkage disequilibrium in the human genome. Nature Reviews Genetics 4(8), 587–597.
- Wang, J.-L., J.-M. Chiou, and H.-G. M¨uller (2016). Functional data analysis. Annual Review of Statistics and Its Application 3, 257–295.
- Wu, M. C., S. Lee, T. Cai, Y. Li, M. Boehnke, and X. Lin (2011). Rare-variant association testing for sequencing data with the sequence kernel association test. The American Journal of Human Genetics 89(1), 82–93.
- Yang, H., V. Baladandayuthapani, A. U. Rao, and J. S. Morris (2020). Quantile function on scalar regression analysis for distributional data. Journal of the American Statistical Association 115(529), 90–106.
- Yao, F., H.-G. M¨uller, and J.-L. Wang (2005). Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association 100(470), 577–590.
- Yu, S., G. Wang, L. Wang, and L. Yang (2021). Multivariate spline estimation and inference for imageon-scalar regression. Statistica Sinica 31, 1463–1487.
- Yuan, Y., J. H. Gilmore, X. Geng, M. A. Styner, K. Chen, J.-l. Wang, and H. Zhu (2013). A longitudinal functional analysis framework for analysis of white matter tract statistics. In International Conference on Information Processing in Medical Imaging, pp. 220–231. Springer.
- Zhang, D. and X. Lin (2003). Hypothesis testing in semiparametric additive mixed models. Biostatistics 4(1), 57–74.
- Zhang, J.-T. and J. Chen (2007). Statistical inferences for functional data. The Annals of Statistics 35(3), 1052–1079.
- Zhang, T. (2005). Learning bounds for kernel regression using effective data dimensionality. Neural Computation 17(9), 2077–2098.
- Zhao, B., T. Li, Y. Yang, X. Wang, T. Luo, Y. Shan, Z. Zhu, D. Xiong, M. Hauberg, J. Bendl, J. Fullard,
- P. Roussos, Y. Li, J. Stein, and H. Zhu (2021). Common genetic variation influencing human white matter microstructure. Science 372(6548), eabf3736.
- Zhao, T., G. Cheng, and H. Liu (2016). A partially linear framework for massive heterogeneous data. The Annals of Statistics 44(4), 1400–1437.
- Zhu, H., J. Fan, and L. Kong (2014). Spatially varying coefficient model for neuroimaging data with jump discontinuities. Journal of the American Statistical Association 109(507), 1084–1098.
- Zhu, H., L. Kong, R. Li, M. Styner, G. Gerig, W. Lin, and J. H. Gilmore (2011). FADTTS: functional analysis of diffusion tensor tract statistics. NeuroImage 56(3), 1412–1425.
- Zhu, H., R. Li, and L. Kong (2012). Multivariate varying coefficient model for functional responses. The Annals of Statistics 40(5), 2634––2666.
- Zhu, H., T. Li, and B. Zhao (2023). Statistical learning methods for neuroimaging data analysis with applications. Annual Review of Biomedical Data Science 6, 73–104.
- Zhu, H., M. Styner, N. Tang, Z. Liu, W. Lin, and J. H. Gilmore (2010). FRATS: Functional regression analysis of DTI tract statistics. IEEE transactions on Medical Imaging 29(4), 1039–1049.
Acknowledgments
Dr. Zhu’s work was partially supported by the Gillings Innovation Laboratory
on generative AI and by grants from the National Institute on Aging (NIA) of the
National Institutes of Health (NIH), including 1R01AG085581, and RF1AG082938,
the National Institute of Mental Health (NIMH) grant 1R01MH136055, and the
NIH grants R01AR082684, and 1OT2OD038045-01. The content is solely the
responsibility of the authors and does not necessarily represent the official views
of the National Institutes of Health.
Supplementary Materials
Additional simulation results, additional real data analysis, and details of
all the proofs can be found in the supplementary material.