Abstract: In multicategory classification, an estimated generalization error is often used to quantify a classifier's generalization ability. As a result, quality of estimation of the generalization error becomes crucial in tuning and combining classifiers. This article proposes an estimation methodology for the generalization error, permitting a treatment of both fixed and random inputs, which is in contrast to the conditional classification error commonly used in the statistics literature. In particular, we derive a novel data perturbation technique, that jointly perturbs both inputs and outputs, to estimate the generalization error. We show that the proposed technique yields optimal tuning and combination, as measured by generalization. We also demonstrate via simulation that it outperforms cross-validation for both fixed and random designs, in the context of margin classification. The results support utility of the proposed methodology.
Key words and phrases: Averaging, logistic, margins, penalization, support vector.