Statistica Sinica 32 (2022), 1515-1540
Zhao Ren, Sungkyu Jung and Xingye Qiao
Abstract: Set classification aims to classify a set of observations as a whole, as opposed to classifying individual observations separately. To formally understand the unfamiliar concept of binary set classification, we first investigate the optimal decision rule under the normal distribution, which uses the empirical covariance of the set to be classified. We show that the number of observations in the set plays a critical role in bounding the Bayes risk. Under this framework, we further propose new methods of set classification. For the case where only a few parameters of the model drive the difference between two classes, we propose a computationally efficient approach to parameter estimation using linear programming, leading to the Covariance-engaged LInear Programming Set (CLIPS) classifier. Its theoretical properties are investigated for both the independent case and various (short-range and long-range dependent) time series structures among the observations within each set. The convergence rates of the estimation errors and the risk of the CLIPS classifier are established to show that having multiple observations in a set leads to faster convergence rates than in the standard classification situation in which there is only one observation in the set. The applicable domains in which the CLIPS classifier outperforms its competitors are highlighted in a comprehensive simulation study. Finally, we illustrate the usefulness of the proposed methods in classifying real image data in histopathology.
Key words and phrases: Bayes risk, 𝓁1-minimization, quadratic discriminant analysis, set classification, sparsity.