Statistica Sinica 27 (2017), 1125-1153
Abstract: Over the last decade, large-scale multiple testing has found itself at the forefront of modern data analysis. Often data are correlated, so that the observed test statistic used for detecting a non-null case, or signal, at each location in a dataset carries some information about the chances of a true signal at other locations. Brown et al. (2014) proposed, in the neuroimaging context, a Bayesian multiple testing model that accounts for the dependence of each volume element on the behavior of its neighbors through a conditional autoregressive (CAR) model. Here, we propose a generalized CAR model that allows for inclusion of points with no neighbors at all, something that is not possible under conventional CAR models. We consider also neighborhoods based on criteria other than physical location, such as genetic pathways in microarray determined from existing biological knowledge. This provides a unified framework for the simultaneous modeling of dependent and independent cases, resulting in stronger Bayesian learning in the posterior. We justify the selected prior distribution and prove that the resulting posterior distribution is proper. We illustrate the utility of our proposed model by using it to analyze both simulated and real microarray data in which the genes exhibit dependence that is determined by physical adjacency on a chromosome or predefined gene pathways.
Key words and phrases: Conditional autoregressive model, enrichment, microarray, multiple testing, significance analysis of microarrays, spike-and-slab prior.