Abstract: We propose an algorithm for regression tree construction called GUIDE. It is specifically designed to eliminate variable selection bias, a problem that can undermine the reliability of inferences from a tree structure. GUIDE controls bias by employing chi-square analysis of residuals and bootstrap calibration of significance probabilities. This approach allows fast computation speed, natural extension to data sets with categorical variables, and direct detection of local two-variable interactions. Previous algorithms are not unbiased and are insensitive to local interactions during split selection. The speed of GUIDE enables two further enhancements--complex modeling at the terminal nodes, such as polynomial or best simple linear models, and bagging. In an experiment with real data sets, the prediction mean square error of the piecewise constant GUIDE model is within % of that of CART. Piecewise linear GUIDE models are more accurate; with bagging they can outperform the spline-based MARS method.
Key words and phrases: Bagging, bias correction, bootstrap, interaction detection, piecewise linear.