Back To Index Previous Article Next Article Full Text Supplement


Statistica Sinica 22 (2012), 909-932





EFFECT OF HEAVY TAILS ON ULTRA HIGH DIMENSIONAL

VARIABLE RANKING METHODS


Aurore Delaigle and Peter Hall


University of Melbourne


Abstract: Contemporary problems involving sparse, high-dimensional feature selection are becoming rapidly more challenging through substantial increases in dimension. This places ever more stress on methods for analysis, since the effects of even moderately heavy-tailed feature distributions become more significant as the number of features diverges. Data transformations have a significant role to play, reducing noise and enabling an increase in dimension, and for this reason they are increasingly used. In this paper we examine the performance of a typical transformation of this type, and study the extent to which it preserves the main attributes that lead to reliable feature selection. We show both numerically and theoretically that, in the presence of heavy-tailed data, the size of the dimension for which effective variable selection is possible can be increased dramatically, from a low-degree polynomial function of sample size to one that is exponentially large.



Key words and phrases: Correlation, feature selection, heavy tail, nonparametric statistics, Studentising, variable selection.

Back To Index Previous Article Next Article Full Text Supplement