Back To Index Previous Article Next Article Full Text

Statistica Sinica 30 (2020), 511-530

AN ANALYSIS OF THE COST OF HYPERPARAMETER
SELECTION VIA SPLIT-SAMPLE VALIDATION,
WITH APPLICATIONS TO PENALIZED REGRESSION
Jean Feng and Noah Simon
University of Washington

Abstract: In a regression setting, a model estimation procedure constructs a model from training data for given a set of hyperparameters. The optimal hyperparameters that minimize the generalization error of the model are usually unknown. Thus, in practice, they are often estimated using split-sample validation. However, how the generalization error of the selected model grows with the number of hyperparameters to be estimated remains an open question. To address this, we establish finite-sample oracle inequalities for selection based on a single training/test split and cross-validation. We show that if the model estimation procedures are smoothly parameterized by the hyperparameters, the error incurred from tuning the hyperparameters shrinks at a near-parametric rate. Hence for semiparametric and nonparametric model estimation procedures with a fixed number of hyperparameters, this additional error is negligible. For parametric model estimation procedures, adding a hyperparameter is roughly equivalent to adding a parameter to the model itself. In addition, we specialize these ideas for penalized regression problems with multiple penalty parameters. We establish that the fitted models are Lipschitz in the penalty parameters and, thus, our oracle inequalities apply. This result encourages the development of regularization methods with many penalty parameters.

Key words and phrases: Cross-validation, regression, regularization.

Back To Index Previous Article Next Article Full Text