Back To Index Previous Article Next Article Full Text


Statistica Sinica 16(2006), 323-351





OBSERVATIONS ON BAGGING


Andreas Buja and Werner Stuetzle


University of Pennsylvania and University of Washington


Abstract: Bagging is a device intended for reducing the prediction error of learning algorithms. In its simplest form, bagging draws bootstrap samples from the training sample, applies the learning algorithm to each bootstrap sample, and then averages the resulting prediction rules. More generally, the resample size $M$ may be different from the original sample size $N$, and resampling can be done with or without replacement. We investigate bagging in a simplified situation: the prediction rule produced by a learning algorithm is replaced by a simple real-valued U-statistic of i.i.d. data. U-statistics of high order can describe complex dependencies, and yet they admit a rigorous asymptotic analysis. We show that bagging U-statistics often but not always decreases variance, whereas it always increases bias. The most striking finding, however, is an equivalence between bagging based on resampling with and without replacement: the respective resample sizes $M_{with} = \alpha_{with} N$ and $M_{w/o} =
\alpha_{w/o} N$ produce very similar bagged statistics if $\alpha_{with} = \alpha_{w/o} / (1-\alpha_{w/o})$. While our derivation is limited to U-statistics, the equivalence seems to be universal. We illustrate this point in simulations where bagging is applied to CART trees.



Key words and phrases: Bagging, CART, U-statistics.

Back To Index Previous Article Next Article Full Text