Back To Index Previous Article Next Article Full Text

Statistica Sinica 32 (2022), 695-718

PERFORMANCE ASSESSMENT OF
HIGH-DIMENSIONAL VARIABLE IDENTIFICATION

Yanjia Yu1 , Yi Yang2 and Yuhong Yang1

1University of Minnesota and 2McGill University

Abstract: Because model selection is ubiquitous in data analysis, the reproducibility of statistical results requires that we be able to evaluate the reliability of the employed model selection method, regardless of the model's apparent good properties. Instability measures have been proposed for evaluating model selection uncertainty. However, low instability does not necessarily indicate that the selected model is trustworthy, because low instability can also arise when a method tends to select an overly parsimonious model. F- and G-measures have become increasingly popular for assessing variable selection performance in theoretical studies and simulation results. However, they are not computable in practice. In this work, we propose an estimation method for F- and G-measures and prove their desirable properties of uniform consistency. This gives the data analyst a valuable tool to compare different variable selection methods based on the data at hand. Extensive simulations are conducted to show the very good finite-sample performance of our approach. Lastly, we apply our methods to several microarray gene expression data sets, with intriguing results.

Key words and phrases: F-measure, G-measure, gene expression, model averaging, reproducibility, variable selection performance.

Back To Index Previous Article Next Article Full Text