Statistica Sinica 28 (2018), 1-25
Abstract: Stability is an important aspect of a classification procedure as unstable predictions can potentially reduce users' trust in a classification system and harm the reproducibility of scientific conclusions. We introduce a concept of classification instability, decision boundary instability (DBI), and incorporate it with the generalization error (GE) as a standard for selecting the most accurate and stable classifier. For this, we implement a two-stage algorithm: (i) select a subset of classifiers whose estimated GEs are not significantly different from the minimal estimated GE among all the candidate classifiers; (ii) take the optimal classifier to be the one achieving the minimal DBI among the subset selected in stage (i). This selection principle applies to both linear and nonlinear classifiers. Large-margin classifiers are used as a prototypical example to illustrate this idea. Our selection method is shown to be consistent in the sense that the optimal classifier simultaneously achieves the minimal GE and the minimal DBI. Various simulations and examples further demonstrate the advantage of our method over alternative approaches.
Key words and phrases: Asymptotic normality, large-margin, model selection, selection consistency, stability.