Back To Index Previous Article Next Article Full Text Supplement

Statistica Sinica 23 (2013), 1463-1477



Jianxin Shi and Peng Li

National Cancer Institute

Abstract: Copy number variations (CNVs) are a major source of genetic variation in humans. In large-scale genome-wide association studies (GWAS), CNVs have been detected from the intensity data generated by SNP genotyping arrays and then tested for association. This strategy lacks statistical power for detecting associations with short CNVs. In this article, we propose methods for testing the association for each probe, based on a Hidden Markov Model that leverages information from nearby probes in the same CNV region. Our methods do not require specifying CNV regions, are convenient for genome-wide scan data, and work for both population-based and family-based studies. Through simulation studies, we found that loss of efficiency due to CNV calling uncertainty was very small even for short CNVs covering as few as four probes in case-control studies. The efficiency loss was larger for short CNVs in family studies. We applied our methods to a large family-based GWAS of autism in 831 trios, and identified a genomic region on chromosome 17q22 harboring deletions that may contribute to the disease risk. Our methods are computationally efficient, requiring only two hours to analyze the genome-wide intensity data of all trios using a single Linux core.

Key words and phrases: Copy number variation, family-based study, genome-wide association study, Hidden Markov Model, TDT.

Back To Index Previous Article Next Article Full Text SupplementSupplement