Back To Index Previous Article Next Article Full Text

Statistica Sinica 36 (2026), 415-437

AN EMPIRICAL BAYES REGRESSION FOR
MULTI-TISSUE GENE EXPRESSION PREDICTION

Fei Xue and Hongzhe Li*

Purdue University and University of Pennsylvania

Abstract: The Genotype-Tissue Expression (GTEx) project collects samples from multiple human tissues to study the relationship between genetic variation or single nucleotide polymorphisms (SNPs) and gene expression in each tissue. However, most existing eQTL analyses only focus on single tissue information. In this paper, we develop a multi-tissue method that improves prediction of gene expression based on cis-SNPs by borrowing information across tissues. Specifically, we propose an empirical Bayes regression model for SNP-expression association using data from multiple tissues. To allow the effects of SNPs to vary greatly among tissues, we use a mixture distribution as the prior, which is a mixture of a multivariate Gaussian distribution and a Dirac mass at zero. We show that the proposed estimator of the cis-SNP effects on gene expression asymptotically achieves the minimum Bayes risk among all estimators. Analyses of the GTEx data show that our proposed method is superior to existing methods in terms of prediction accuracy for gene expression using cis-SNPs in testing sets.

Key words and phrases: Bayes risk, data integration, missing data, mixture model.


Back To Index Previous Article Next Article Full Text