Statistica Sinica 24 (2014), 1117-1141
Abstract: This paper addresses the problem of variance estimation for a
general U-statistic. U-statistics form a class of unbiased estimators for those
parameters of interest that can be written as E, where ϕ
is a symmetric kernel function with k arguments. Although estimating the
variance of a U-statistic is clearly of interest, asymptotic results for a general
U-statistic are not necessarily reliable when the kernel size k is not negligible
compared with the sample size n. Such situations arise in cross-validation
and other nonparametric risk estimation problems. On the other hand, the
exact closed form variance is complicated in form, especially when both k
and n are large. We have devised an unbiased variance estimator for a general
U-statistic. It can be written as a quadratic form of the kernel function ϕ and
is applicable as long as k ≤ n∕2. In addition, it can be represented in a familiar
analysis of variance form as a contrast of between-class and within-class
variation. As a further step to make the proposed variance estimator more
practical, we developed a partition resampling scheme that can be used to
realize the U-statistic and its variance estimator simultaneously with high
computational efficiency. A data example in the context of model selection is
provided. To study our estimator, we construct a U-statistic cross-validation
tool, akin to the bic criterion for model selection. With our variance estimator
we can test which model has the smallest risk.
Key words and phrases: Best unbiased estimator, cross-validation, likelihood risk, model selection, partition resampling, U-statistic, variance.