Back To Index Previous Article Next Article Full Text

Statistica Sinica 32 (2022), 109-130


Hangjin Jiang1,2, Xingqiu Zhao3, Ronald C.W. Ma2 and Xiaodan Fan2

1Zhejiang University, 2The Chinese University of Hong Kong and
3The Hong Kong Polytechnic University

Abstract: We consider variable screening in high-dimensional binary classification. First, we propose nonparametric test statistics for the problem of the two-sample distribution comparison. These test statistics combine the merits of the chi-squared and Kolmogorov-Smirnov statistics, and provide new insights into the equality test of the unspecified distributions underlying the two independent samples. Based on our new statistics, we propose a marginal screening procedure and a pairwise joint screening procedure for detecting important variables in high-dimensional binary classification. Both screening procedures have the consistent screening property, which is stronger than the sure screening property of most existing methods. The marginal screening procedure is much more powerful than other methods over a broad range of cases, and the pairwise joint screening procedure provides a way of detecting variables with a joint effect, but no marginal effect. Extensive simulations and a real-data application show the effectiveness and advantages of the proposed methods.

Key words and phrases: Binary classification, consistency, non-parametric test, Two-sample distribution comparison, variable screening.

Back To Index Previous Article Next Article Full Text