Abstract

We consider the complex data modeling problem motivated by the zero

inflated and overdispersed data from microbiome studies. Analyzing how microbiome abundance is associated with human biological features, such as BMI, is

of great importance for host health. Methods based on parametric distributional

assumptions, such as zero-inflated Poisson and zero-inflated Negative Binomial

regression, have been widely used in modeling such data, yet the parametric assumptions are restricted and hard to verify in real-world applications. We relax

the parametric assumptions and propose a semiparametric single-index quantile

regression model. It is flexible to include a wide range of possible association functions and adaptable to the various zero proportions across subjects, which relaxes

the strong parametric distributional assumptions of most existing zero-inflated

data modeling approaches. We establish the asymptotic properties for the index

coefficients estimator and quantile regression curve estimation. Through extensive simulation studies, we demonstrate the superior performance of the proposed

method regarding model fitting.

Information

Preprint No.SS-2024-0104
Manuscript IDSS-2024-0104
Complete AuthorsZirui Wang, Tianying Wang
Corresponding AuthorsTianying Wang
Emailstianyingw0905@outlook.com

References

  1. Cani, P. D. (2018). Human gut microbiome: hopes, threats and promises. Gut 67(9), 1716–1725.
  2. Cannon, A. J. (2018). Non-crossing nonlinear regression quantiles by monotone composite quantile regression neural network, with application to rainfall extremes. Stochastic environmental research and risk assessment 32(11), 3207–3225.
  3. Chen, E. Z. and H. Li (2016). A two-part mixed-effects model for analyzing longitudinal microbiome compositional data. Bioinformatics 32(17), 2611–2617.
  4. Claesson, M. J., I. B. Jeffery, S. Conde, S. E. Power, and E. M. e. a. O’connor (2012). Gut microbiota composition correlates with diet and health in the elderly. Nature 488(7410), 178–184.
  5. De Boor, C. (2001). Revised edition. applied mathematical sciences. De la Cuesta-Zuluaga, J., V. Corrales-Agudelo, E. P. Vel´asquez-Mej´ıa, J. A. Carmona, J. M.
  6. Abad, and J. S. Escobar (2018). Gut microbiota is associated with obesity and cardiometabolic disease in a population in the midst of westernization. Scientific reports 8(1), 1–14.
  7. Gonzalez, A., J. A. Navas-Molina, T. Kosciolek, D. McDonald, Y. V´azquez-Baeza, and G. A.
  8. et al. (2018). Qiita: rapid, web-enabled microbiome meta-analysis. Nature Methods 15, 796–798.
  9. Heyman, D., A. Tabatabai, and T. Lakshman (1991). Statistical analysis and simulation study of video teleconference traffic in atm networks. In IEEE Global Telecommunications Conference GLOBECOM’91: Countdown to the New Millennium. Conference Record, pp. 21–27. IEEE.
  10. Jiang, R., X. Zhan, and T. Wang (2022). A flexible zero-inflated poisson-gamma model with application to microbiome read counts. arXiv preprint arXiv:2207.07796.
  11. Jiang, S., G. Xiao, A. Y. Koh, J. Kim, Q. Li, and X. Zhan (2021). A bayesian zero-inflated negative binomial regression model for the integrative analysis of microbiome data. Biostatistics 22(3), 522–540.
  12. Kaul, A., S. Mandal, O. Davidov, and S. D. Peddada (2017). Analysis of microbiome data in the presence of excess zeros. Frontiers in microbiology 8, 2114.
  13. Koenker, R. W. and G. Bassett (1978). Regression quantiles. Econometrica 46(1), 33–50.
  14. Lambert, D. (1992). Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics 34(1), 1–14.
  15. Li, L. and X. Yin (2008). Sliced inverse regression with regularizations. Biometrics 64(1), 124–131.
  16. Liang, H., X. Liu, R. Li, and C.-L. Tsai (2010). Estimation and testing for partially linear single-index models. Annals of statistics 38(6), 3811.
  17. Ling, W., B. Cheng, Y. Wei, J. Z. Willey, and Y. K. Cheung (2022). Statistical inference in quantile regression for zero-inflated outcomes. Statistica Sinica 32(3), 1411.
  18. Lloyd-Price, J., G. Abu-Ali, and C. Huttenhower (2016). The healthy human microbiome. Genome medicine 8(1), 1–11.
  19. Ma, S. and X. He (2016). Inference for single-index quantile regression models with profile optimization. The Annals of Statistics 44(3).
  20. Ma, Y. and L. Zhu (2012). A semiparametric approach to dimension reduction. Journal of the American Statistical Association 107(497), 168–179.
  21. Ma, Y. and L. Zhu (2013). Doubly robust and efficient estimators for heteroscedastic partially linear single-index models allowing high dimensional covariates. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75(2), 305–322.
  22. McMurdie, P. J. and S. Holmes (2014). Waste not, want not: why rarefying microbiome data is inadmissible. PLoS computational biology 10(4), e1003531.
  23. Neykov, M., J. S. Liu, and T. Cai (2016). L1-regularized least squares for support recovery of high dimensional single index models with gaussian designs. The Journal of Machine Learning Research 17(1), 2976–3012.
  24. Peng, H. and T. Huang (2011). Penalized least squares for single index models. Journal of Statistical Planning and Inference 141(4), 1362–1379.
  25. Radchenko, P. (2015). High dimensional single index models. Journal of Multivariate Analysis 139, 266–282.
  26. Silverman, J. D., K. Roche, S. Mukherjee, and L. A. David (2020). Naught all zeros in sequence count data are the same. Computational and structural biotechnology journal 18, 2789– 2798.
  27. Wadsworth, W. D., R. Argiento, M. Guindani, J. Galloway-Pena, and S. A. e. a. Shelburne
  28. (2017). An integrative bayesian dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data. BMC bioinformatics 18(1), 1–12.
  29. Wei, Y. and X. He (2006). Conditional growth charts. The Annals of Statistics 34(5), 2069– 2097.
  30. Weir, C. B. and A. Jan (2019). Bmi classification percentile and cut off points.
  31. Xia, Y. (2020). Correlation and association analyses in microbiome study integrating multiomics in health and disease. Progress in Molecular Biology and Translational Science 171, 309– 491.
  32. Xia, Y. and J. Sun (2017). Hypothesis testing and statistical analysis of microbiome. Genes & diseases 4(3), 138–148.
  33. Xia, Y., J. Sun, D.-G. Chen, Y. Xia, J. Sun, and D.-G. Chen (2018). Modeling zero-inflated microbiome data. Statistical analysis of microbiome data with R, 453–496.
  34. Xu, W., H. J. Wang, and D. Li (2022). Extreme quantile estimation based on the tail singleindex model. Statistica Sinica 32(2), 893–914.
  35. Yatsunenko, T., F. E. Rey, M. J. Manary, I. Trehan, M. G. Dominguez-Bello, and M. e. a. Contreras (2012). Human gut microbiome viewed across age and geography. nature 486(7402), 222–227.
  36. Yu, Y. and D. Ruppert (2002). Penalized spline estimation for partially linear single-index models. Journal of the American Statistical Association 97(460), 1042–1054.
  37. Zeng, Y., J. Li, C. Wei, H. Zhao, and W. Tao (2022). mbdenoise: microbiome data denoising using zero-inflated probabilistic principal components analysis. Genome Biology 23(1), 1–29.
  38. Zhang, X., H. Mallick, Z. Tang, L. Zhang, X. Cui, and A. K. e. a. Benson (2017). Negative binomial mixed models for analyzing microbiome count data. BMC bioinformatics 18(1), 1–10.
  39. Zhang, X. and N. Yi (2020). Fast zero-inflated negative binomial mixed modeling approach for analyzing longitudinal metagenomics data. Bioinformatics 36(8), 2345–2351.

Acknowledgments

We thank the editor, associate editor, and two referees for their valuable

comments and constructive suggestions.

Supplementary Materials

The online Supplementary Material contains the proofs of the theorems and

the additional results for simulation and application.


Supplementary materials are available for download.