Back To Index Previous Article Next Article Full Text

Statistica Sinica 31 (2021), 1571-1592

SIMULTANEOUS ESTIMATION OF NORMAL MEANS
WITH SIDE INFORMATION

Sihai Dave Zhao

University of Illinois at Urbana-Champaign

Abstract: Conducting integrative analyses of multiple data sets is an important strategy in data analysis. It is becoming increasingly popular in genomics, which enjoys a wealth of publicly available data sets that can be compared, contrasted, and combined in order to extract novel scientific insights. This study examines a stylized example of data integration for a classical statistical problem: leveraging side information to estimate a vector of normal means. We formulate this task as a compound decision problem, derive an oracle integrative decision rule, and propose a data-driven estimate of this rule based on minimizing an unbiased estimate of its risk. The data-driven rule is shown to asymptotically achieve the minimum possible risk among all separable decision rules, and it can outperform existing methods in terms of numerical properties. The proposed procedure leads naturally to an integrative high-dimensional classification procedure, which is illustrated by combining data from two independent gene expression profiling studies.

Key words and phrases: Compound decision problem, data integration, Gaussian sequence problem, integrative genomics, nonparametric empirical Bayes.

Back To Index Previous Article Next Article Full Text