Peter Hall and Ingrid Van Keilegom (2007). Two-sample tests in functional data analysis starting from discrete data. Vol.17, No.4.

Statistica Sinica 17(2007), 1511-1531

TWO-SAMPLE TESTS IN FUNCTIONAL DATA ANALYSIS

STARTING FROM DISCRETE DATA

Peter Hall $^{1,2}$ and Ingrid Van Keilegom

The University of Melbourne and Université catholique de Louvain

Abstract: One of the ways in which functional data analysis differs from other areas of statistics is in the extent to which data are pre-processed prior to analysis. Functional data are invariably recorded discretely, although they are generally substantially smoothed as a prelude even to viewing by statisticians, let alone further analysis. This has a potential to interfere with the performance of two-sample statistical tests, since the use of different tuning parameters for the smoothing step, or different observation times or subsample sizes (i.e., numbers of observations per curve), can mask the differences between distributions that a test is trying to locate. In this paper, and in the context of two-sample tests, we take up this issue. Ways of pre-processing the data, so as to minimise the effects of smoothing, are suggested. We show theoretically and numerically that, by employing exactly the same tuning parameter (e.g. bandwidth) to produce each curve from its raw data, significant loss of power can be avoided. Provided a common tuning parameter is used, it is often satisfactory to choose that parameter along conventional lines, as though the target was estimation of the continuous functions themselves, rather than testing hypotheses about them. Moreover, in this case, using a second-order smoother (such as a local-linear method), the subsample sizes can be almost as small as the square root of sample sizes before the effects of smoothing have any first-order impact on the results of a two-sample test.

Key words and phrases: Bandwidth, bootstrap, curve estimation, hypothesis testing, kernel, Cramér-von Mises test, local-linear methods, local-polynomial methods, nonparametric regression, smoothing.