Back To Index Previous Article Next Article Full Text Supplement


Statistica Sinica 19 (2009), 343-354





RELATIVE ERRORS IN CENTRAL LIMIT THEOREMS

FOR STUDENT'S $t$ STATISTIC, WITH APPLICATIONS


Qiying Wang and Peter Hall


University of Sydney and University of Melbourne
Abstract: Student's $t$ statistic is frequently used in practice to test hypotheses about means. Today, in fields such as genomics, tens of thousands of $t$-tests are implemented simultaneously, one for each component of a long data vector. The distributions from which the $t$ statistics are computed are almost invariably non-normal and skew, and the sample sizes are relatively small, typically about one thousand times smaller than the number of tests. Therefore, theoretical investigations of the accuracy of the tests would be based on large-deviation expansions. Recent research has shown that in this setting, unlike classical contexts, weak dependence among vector components is often not a problem; independence can safely be assumed when the significance level is very small, provided dependence among the test statistics is short range. However, conventional large-deviation results provide information only about the accuracy of normal and Student's $t$ approximations under the null hypothesis. Power properties, especially against sparse local alternatives, require more general expansions where the data no longer have zero mean, and in fact where the mean can depend on both sample size and the number of tests. In this paper we derive this type of expansion, and show how it can be used to draw statistical conclusions about the effectiveness of many simultaneous $t$-tests. Similar arguments can be used to derive properties of classifiers based on high-dimensional data.



Key words and phrases: Genomics, classification, family-wise error rate, large-deviation expansion, signal detection, simultaneous hypothesis testing, sparsity.

Back To Index Previous Article Next Article Full Text Supplement