Back To Index Previous Article Next Article Full Text

Statistica Sinica 30 (2020), 1485-1516

RANKING-BASED VARIABLE SELECTION
FOR HIGH-DIMENSIONAL DATA
Rafal Baranowski, Yining Chen and Piotr Fryzlewicz
London School of Economics and Political Science

Abstract: We propose a ranking-based variable selection (RBVS) technique that identifies important variables influencing the response in high-dimensional data. RBVS uses subsampling to identify the covariates that appear nonspuriously at the top of a chosen variable ranking. We study the conditions under which such a set is unique, and show that it can be recovered successfully from the data by our procedure. Unlike many existing high-dimensional variable selection techniques, among all relevant variables, RBVS distinguishes between important and unimportant variables, and aims to recover only the important ones. Moreover, RBVS does not require model restrictions on the relationship between the response and the covariates, and, thus, is widely applicable in both parametric and nonparametric contexts. Lastly, we illustrate the good practical performance of the proposed technique by means of a comparative simulation study. The RBVS algorithm is implemented in rbvs, a publicly available R package.

Key words and phrases: Bootstrap, stability selection, subset selection, variable screening.

Back To Index Previous Article Next Article Full Text