Statistica Sinica 7(1997), 545-575

SEMI-PARAMETRIC ESTIMATES UNDER BIASED

SAMPLING

Jiayang Sun and Michael Woodroofe

University of Michigan

Abstract: In observational studies subjects may self select, thereby creating a biased sample. Such problems arise frequently, for example, in astronomical, biomedical, animal, and oil studies, survey sampling and econometrics. For a typical subject, let Y denote the value of interest and suppose that Y has an unknown density function f. Further, let w(y) denote the probability that the subject includes itself in the study given Y=y. Then the conditional density of Y given that it is observed is f* (y)=w(y)f(y)/k, where k is a normalizing constant. The problem of estimating w and f from a biased sample X1,...,Xn independently from f* is considered when f is known to belong to a parametric family, say f=fθ, where θ is a vector of unknown parameters, and w is assumed to be non-decreasing. An algorithm for computing the maximum likelihood estimator of (w,θ) is developed, and consistency is established. Simulations are used to show that our method is feasible with moderate sample size, and applications to animal and oil data are given.

Key words and phrases: Animal and oil data, convergence, consistency, EM algorithm, incomplete sample, maximum likelihood estimates, MM algorithm, selection bias, simulations.