Abstract: Haplotype, or the sequence of alleles along a single chromosome, has important applications in phenotype-genotype association studies, as well as in population genetics analyses. Because haplotype cannot be experimentally assayed in diploid organisms in a high-throughput fashion, numerous statistical methods have been developed to reconstruct probable haplotype from genotype data. These methods focus primarily on accurate phasing of a short genomic region with a small number of markers, and the error rate increases rapidly for longer regions. Here we introduce a new phasing algorithm, , which aims to improve long-range phasing accuracy. Using datasets from multiple populations, we found that reduces long-range phasing errors by up to compared to the current state-of-the-art methods. In addition to inferring the most likely haplotypes, produces confidence measures, allowing downstream analyses to account for the uncertainties associated with some haplotypes. We anticipate that offers a powerful tool for analyzing large-scale data generated in the genome-wide association studies (GWAS).
Key words and phrases: Expectation maximization, graphical model, haplotype, phasing.