Abstract
We propose a new method of statistical inference, called the method of limits
(MoL), which may be viewed as an extension of the method of moments. This method is
motivated by the need to analyze count data for genome wide association studies (GWAS),
where the existing methods are hindered in statistical inference due to computational challenges. We establish consistency and asymptotic normality of the MoL estimator of heri-
tability from GWAS data, which is seen as an advantage over the existing PQLseq method.
Furthermore, we derived a consistent estimator of the proportion of causal SNPs. MoL
also showed an advantage of both statistical and computational efficiency measured by average statistical efficiency (ASE) in our simulation studies compared to PQLseq. We also
illustrate the usefulness of MoL through its application to the UK Biobank data to infer the
heritability of weekly champagne consumption and weekly red wine consumption using
the count data.
Information
| Preprint No. | SS-2024-0092 |
|---|---|
| Manuscript ID | SS-2024-0092 |
| Complete Authors | Jiming Jiang, Leqi Xu, Yiliang Zhang, Hongyu Zhao |
| Corresponding Authors | Jiming Jiang |
| Emails | jimjiang@ucdavis.edu |
References
- Booth, J. G. and Hobert, J. P. (1999), Maximum generalized linear mixed model likelihood with an automated Monte Carlo EM algorithm, J. Roy. Statist. Soc. B 61, 265–285.
- Breslow, N. E. and Clayton, D. G. (1993), Approximate inference in generalized linear mixed models, J. Amer. Statist. Assoc. 88, 9–25.
- Bycroft, C., Freeman, C., Petkova, D., Band, G., Elliott, L. T., et al. (2018), The UK Biobank resource with deep phenotyping and genomic data, Nature 562, 203–209.
- Dao, C., Jiang, J., Paul, D., and Zhao, H. (2021), Variance estimation and confidence intervals from highdimensional genome-wide association studies through misspecified mixed model analysis, J. Stat. Plan. Inference 220, 15–23.
- Golan, D., Lander, E. S., and Rosset, S. (2014), Measuring missing heritability: Inferring the contribution of common variants, PNAS 111, E5272–E5281.
- Jiang, J. (1998), Consistent estimators in generalized linear mixed models, J. Amer. Statist. Assoc. 93, 720–729.
- Jiang, J. and Nguyen, T. (2021), Linear and Generalized Linear Mixed Models and Their Applications, 2nd ed., Springer, New York.
- Jiang, J. (2022), Large Sample Techniques for Statistics, 2nd ed., Springer, New York.
- Jiang, J., Li, C., Paul, D., Yang, C., and Zhao, H. (2016), On high-dimensional misspecified mixed model analysis in genome-wide association study, Ann. Statist. 44, 2127–2160.
- Little, R. J. A. and Rubin, D. B. (2002), Statistical Analysis with Missing Data, 2nd ed., Wiley, New York.
- Sudlow, C., Gallacher, J., Allen, N., Beral, V., Burton, P. and others (2015), UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLOS Medicine 12, e1001779.
- Sun, S., Zhu, J., Mozaffari, S., Ober, C., Chen, M., and Zhou, X. (2019), Heritability estimation and differential analysis of count data with generalized linear mixed models in genomic sequencing studies, Bioinformatics 35, 487–496.
- Yang, J., Benyamin, B., McEvoy, B. P., Gordon, S., Henders, A. K., Nyholt, D. R., Madden, P. A., Heath, A.
- C., Martin, N. G., Montgomery, G. W. and others (2010), Common SNPs explain a large proportion of the heritability for human height, Nature Genetics 42, 565–569.
Acknowledgments
The research of Jiming Jiang is partially supported by the NSF grants DMS-
1713120, DMS-1914465 and DMS-2210569. The research of Hongyu Zhao is
partially supported by DMS 1713120 and NIH R01 GM134005. The research
was conducted using the UKBB resource under approved data requests (access
ref: 29900).
Supplementary Materials
The Supplementary Material contains proofs of the main theoretical results.
The code for simulations and real data analysis is available at https://
github.com/LeqiXu/MoL_analysis.