Statistica Sinica 27 (2017), 1155-1174
Abstract: This paper addresses the problem of variance estimation of a general Ustatistic with large kernel size (degree) 𝑘. U-statistics form a class of unbiased estimators. It was first proposed in Hoefiding (1948) and has since been widely used in many statistical applications. Wang and Lindsay (2014) propose an unbiased variance estimator for a general U-statistic; it is applicable provided that the kernel size 𝑘 is at most half of the sample size 𝓃. This condition restricts its application to common 𝐾-fold cross-validation problems. We devise a pseudo-kernel variance estimator that can be realized in the same fashion as the unbiased variance estimator, but is defined based on a pseudo-kernel function of degree two. We demonstrate how to construct a pseudo-kernel function and show that the resulting variance estimator is second-order unbiased. Moreover, we develop an efficient realization of the proposal in the context of 𝐾-fold cross-validation. The proposed variance estimator shows comparable performance with significantly improved computational efficiency compared to its bootstrap and jackknife counterparts in simulation and data analysis in the context of model selection using the “one-standard-error” rule.
Key words and phrases: 𝐾-fold cross-validation, Kullback-Leibler distance, pseudo-kernel, second-order unbiased, U-statistic, variance estimation.