Back To Index Previous Article Next Article Full Text

Statistica Sinica 10(2000), 789-817



Colin O. Wu

The Johns Hapkins University

Abstract: Let $Y$ and ${\bf X}$ be real- and $R^d$-valued random variables. We consider the estimation of the nonparametric regression function $m({\bf x}) =E( Y\vert {\bf X}={\bf x})$ when $s\geq 1$ independent selection-biased samples of $(Y, {\bf X})$ are observed. This sampling scheme, which arises naturally in biological and epidemiological studies and many other fields, includes stratified samples, length-biased samples and other weighted distributions. A class of local polynomial estimators of $m({\bf x})$ is derived by smoothing Vardi's nonparametric maximum likelihood estimator of the underlying distribution function. Large sample properties, such as asymptotic distributions and asymptotic mean squared risks, are derived explicitly. Unlike local polynomial regression with i.i.d. direct samples, we show here that kernel choices are important and optimal kernel functions may be asymmetric and discontinuous when the weight functions of the biased samples have jumps. A cross-validation criterion is proposed for the selection of data-driven bandwidths. Through a simple comparison, we show that our estimators are superior to other intuitive estimators of $m({\bf x})$.

Key words and phrases: Cross-validation, local polynomials, nonparametric maximum likelihood estimator, optimal kernel and bandwidths, selection-biased sample.

Back To Index Previous Article Next Article Full Text