Statistica Sinica 32 (2022), 109-130
Hangjin Jiang1,2, Xingqiu Zhao3, Ronald C.W. Ma2 and Xiaodan Fan2
Abstract: We consider variable screening in high-dimensional binary classification. First, we propose nonparametric test statistics for the problem of the two-sample distribution comparison. These test statistics combine the merits of the chi-squared and Kolmogorov–Smirnov statistics, and provide new insights into the equality test of the unspecified distributions underlying the two independent samples. Based on our new statistics, we propose a marginal screening procedure and a pairwise joint screening procedure for detecting important variables in high-dimensional binary classification. Both screening procedures have the consistent screening property, which is stronger than the sure screening property of most existing methods. The marginal screening procedure is much more powerful than other methods over a broad range of cases, and the pairwise joint screening procedure provides a way of detecting variables with a joint effect, but no marginal effect. Extensive simulations and a real-data application show the effectiveness and advantages of the proposed methods.
Key words and phrases: Binary classification, consistency, non-parametric test, Two-sample distribution comparison, variable screening.