Back To Index Previous Article Next Article Full Text Supplement


Statistica Sinica 22 (2012), 1041-1074





ON MODEL SELECTION STRATEGIES TO IDENTIFY

GENES UNDERLYING BINARY TRAITS USING

GENOME-WIDE ASSOCIATION DATA


Zheyang Wu and Hongyu Zhao


Worcester Polytechnic Institute and Yale University


Abstract: For more fruitful discoveries of disease genes in genome-wide association studies, it is important to know whether joint analysis of multiple markers is more powerful than the commonly used single-marker analysis, especially in the presence of gene-gene interactions. The existing literature has different, even conflicting, arguments about the power of the common model selection strategies: marginal search, exhaustive search, and forward search. Here we analytically calculate the power of these strategies and two-stage screen search to detect binary trait loci. Our approach incorporates linkage disequilibrium, random genotypes, and correlations among test statistics, which are critical characteristics of model selection that are often ignored for simplicity in the existing literature. We derive analytical results for the power of the methods to find all the associated markers, and the power to find at least one associated marker. We also consider two types of widely applied error controls: the discovery number control and the Bonferroni type I error rate control. After demonstrating the accuracy of our analytical results by simulations, we apply them to investigate the relative performance of various model selection methods in a broad genetic model space. Our research demonstrates the significant differences in power calculation and power comparison between the selection methods for binary trait and the methods for quantitative trait. Our analytical study provides rapid computation as well as insights into the statistical mechanism of capturing genetic signals under different genetic models including gene-gene interactions. We develop an R package to implement our analytical methods. Even though we focus on genetic association analysis, our results on the power of model selection procedures are general, and applicable to other studies.



Key words and phrases: Gene-gene interaction, genome-wide association studies, model selection, random predictors, statistical power.

Back To Index Previous Article Next Article Full Text Supplement