Back To Index Previous Article Next Article Full Text

Statistica Sinica 23 (2013), 1441-1461



Nicholas A Johnson$^1$, Stephanie J. London$^2$, Isabelle Romieu$^3$
Wing H. Wong$^1$, Hua Tang$^1$

$^1$Stanford University, $^2$National Institute of Environmental Health Sciences,
and $^3$International Agency for Research on Cancer, France

Abstract: Haplotype, or the sequence of alleles along a single chromosome, has important applications in phenotype-genotype association studies, as well as in population genetics analyses. Because haplotype cannot be experimentally assayed in diploid organisms in a high-throughput fashion, numerous statistical methods have been developed to reconstruct probable haplotype from genotype data. These methods focus primarily on accurate phasing of a short genomic region with a small number of markers, and the error rate increases rapidly for longer regions. Here we introduce a new phasing algorithm, , which aims to improve long-range phasing accuracy. Using datasets from multiple populations, we found that reduces long-range phasing errors by up to $50\%$ compared to the current state-of-the-art methods. In addition to inferring the most likely haplotypes, produces confidence measures, allowing downstream analyses to account for the uncertainties associated with some haplotypes. We anticipate that offers a powerful tool for analyzing large-scale data generated in the genome-wide association studies (GWAS).

Key words and phrases: Expectation maximization, graphical model, haplotype, phasing.

Back To Index Previous Article Next Article Full Text