Back To Index Previous Article Next Article Full Text

Statistica Sinica 32 (2022), 613-633

HYPOTHESIS TESTING IN HIGH-DIMENSIONAL
INSTRUMENTAL VARIABLES REGRESSION
WITH AN APPLICATION TO GENOMICS DATA

Jiarui Lu and Hongzhe Li

University of Pennsylvania

Abstract: Gene expression and phenotype association can be affected by potential unmeasured confounders from multiple sources, leading to biased estimates of the associations. Because genetic variants largely explain gene expression variations, they can be used as instrumental variables (IVs) when studying the association between gene expressions and phenotypes in a high-dimensional IV regression framework. Because the dimensions of both genetic variants and gene expressions are often larger than the sample size, statistical inferences (e.g., hypothesis testing) for such high-dimensional IV models are not trivial, and have not been investigated in the literature. The problem is made more challenging because the IVs (e.g., genetic variants) have to be selected from a large set of genetic variants. This study considers the problem of hypothesis testing for sparse IV regression models, and presents methods for testing a single regression coefficient and for multiple testing of multiple coefficients, where the test statistic for each single coefficient is constructed based on an inverse regression. A multiple testing procedure is developed for selecting variables, and is shown to control the false discovery rate. Simulations are conducted to evaluate the performance of our proposed methods. Lastly, we apply the proposed methods by analyzing a yeast data set in order to identify genes that are associated with growth in the presence of hydrogen peroxide.

Key words and phrases: Debiased estimation, FDR control, genetical genomics, inverse regression, multiple testing.

Back To Index Previous Article Next Article Full Text