Statistica Sinica 25 (2015), 1637-1658

CONFIDENCE SETS FOR MODEL SELECTION

BY F-TESTING

Davide Ferrari and Yuhong Yang

University of Melbourne and University of Minnesota

Abstract: We introduce the notion of variable selection confidence set (VSCS) for linear regression based on F-testing. Our method identifies the most important variables in a principled way that goes beyond simply trusting the single winner based on a model selection criterion. The VSCS extends the usual notion of confidence intervals to the variable selection problem: A VSCS is a set of regression models that contains the true model with a given level of confidence. Although the size of the VSCS properly reflects the model selection uncertainty, without specific assumptions on the true model, the VSCS is typically rather large (unless the number of predictors is small). As a solution, we advocate special attention to the set of lower boundary models (LBMs), which are the most parsimonious models not statistically significantly inferior to the full model at a given confidence level. Based on the LBMs, variable importance and measures of co-appearance importance of predictors can be naturally defined.

Key words and phrases: Confidence set, linear regression, model selection, variable selection.