Statistica Sinica 30 (2020), 925-953
Abstract: Advancements in technology have generated abundant high-dimensional data, enabling us to integrate multiple relevant studies. In terms of variable selection, the significant computational advantage of variable screening methods based on marginal correlations has resulted in these becoming promising alternatives to the popular regularization methods. However, these screening methods have thus far been limited to single studies. In this study, we consider a general framework for variable screening across multiple related studies. As such, we propose a novel two-step screening procedure, based on a self-normalized estimator, for high-dimensional regression analyses within this framework. Compared with the one-step procedure and rank-based sure independence screening (SIS) procedures, the proposed procedure greatly reduces the false negative rate, while keeping a low false positive rate. From a theoretical perspective, we show that our procedure possesses the sure screening property, with weaker assumptions on the signal strengths, and allows the number of features to grow at an exponential rate with the sample size. In addition, we relax the commonly used normality assumption and allow sub-Gaussian distributions. Simulations and a real transcriptomic application illustrate the advantage of our method over the rank-based SIS method.
Key words and phrases: Multiple studies, partial faithfulness, self-normalized estimator, sure screening property, variable selection.