Nicholas A Johnson, Stephanie J. London, Isabelle Romieu, Wing H. Wong and Hua Tang (2013). Accurate construction of long range haplotype in unrelated individuals. Vol. 23, No. 4, 1441-1461.

Statistica Sinica 23 (2013), 1441-1461

ACCURATE CONSTRUCTION OF LONG RANGE

HAPLOTYPE IN UNRELATED INDIVIDUALS

Nicholas A Johnson

, Stephanie J. London

, Isabelle Romieu

Wing H. Wong

, Hua Tang

Stanford University, National Institute of Environmental Health Sciences,

and International Agency for Research on Cancer, France

Abstract: Haplotype, or the sequence of alleles along a single chromosome, has important applications in phenotype-genotype association studies, as well as in population genetics analyses. Because haplotype cannot be experimentally assayed in diploid organisms in a high-throughput fashion, numerous statistical methods have been developed to reconstruct probable haplotype from genotype data. These methods focus primarily on accurate phasing of a short genomic region with a small number of markers, and the error rate increases rapidly for longer regions. Here we introduce a new phasing algorithm, , which aims to improve long-range phasing accuracy. Using datasets from multiple populations, we found that reduces long-range phasing errors by up to $50\%$ compared to the current state-of-the-art methods. In addition to inferring the most likely haplotypes, produces confidence measures, allowing downstream analyses to account for the uncertainties associated with some haplotypes. We anticipate that offers a powerful tool for analyzing large-scale data generated in the genome-wide association studies (GWAS).

Key words and phrases: Expectation maximization, graphical model, haplotype, phasing.