Statistica Sinica 28 (2018), 293-317
Abstract: Detecting candidate genetic variants in genomic studies often encounters confounding problems, particularly when the data are ultrahigh dimensional. Confounding covariates, such as age and gender, not only can reduce the statistical power, but also introduce spurious genetic association. How to control for the confounders in ultrahigh dimensional data analysis is a critical and challenging issue. In this paper, we propose a novel sure independence screening method based on conditional distance correlation under the ultrahigh dimensional model setting. Our proposal accomplishes the adjustment by conditioning on the confounding variables. With the model-free feature of conditional distance correlation, our method does not need any parametric modeling assumptions and is thus quite exible. In addition, it is applicable to data with multivariate response. We show that under some mild technical conditions, the proposed method enjoys the sure screening property even when the dimensionality is an exponential order of the sample size. The simulation studies and a data analysis demonstrate that the proposed procedure has competitive performance.
Key words and phrases: Confounding, feature screening, model free, multivariate response, ultrahigh dimension.