Back To Index Previous Article Next Article Full Text

Statistica Sinica 16(2006), 541-567


Tao Shi and Bin Yu

The Ohio State University and University of California at Berkeley

Abstract: Gaussian kernel regularization is widely used in the machine learning literature and has proved successful in many empirical experiments. The periodic version of Gaussian kernel regularization has been shown to be minimax rate optimal in estimating functions in any finite order Sobolev space. However, for a data set with $n$ points, the computation complexity of the Gaussian kernel regularization method is of order $O$($n^3$). In this paper we propose to use binning to reduce the computation of Gaussian kernel regularization in both regression and classification. For periodic Gaussian kernel regression, we show that the binned estimator achieves the same minimax rates as the unbinned estimator, but the computation is reduced to $O(m^3)$ with $m$ as the number of bins. To achieve the minimax rate in the $k$th order Sobolev space, $m$ needs to be in the order of $O(kn^{1/(2k+1)})$, which makes the binned estimator computation of order $O(n)$ for $k=1$, and even less for larger $k$. Our simulations show that the binned estimator (binning 120 data points into 20 bins in our simulation) provides almost the same accuracy with only 0.4% of computation time. For classification, binning with $L2$-loss Gaussian kernel regularization and Gaussian kernel Support Vector Machines is tested in a polar cloud detection problem.

Key words and phrases: Asymptotic minimax risk, binning, Gaussian kernel, rate of convergence, regularization, Sobolev space, support vector machines.

Back To Index Previous Article Next Article Full Text