Statistica Sinica

Jiayang Sun and Michael Woodroofe

Abstract:In observational studies subjects may self select, thereby creating a biased sample. Such problems arise frequently, for example, in astronomical, biomedical, animal, and oil studies, survey sampling and econometrics. For a typical subject, letYdenote the value of interest and suppose thatYhas an unknown density functionf. Further, letw(y)denote the probability that the subject includes itself in the study givenY=y. Then the conditional density ofYgiven that it is observed isf(y)=w(y)f(y)/k, where^{*}kis a normalizing constant. The problem of estimatingwandffrom a biased sampleX,...,_{1}Xindependently from_{n}fis considered when^{*}fis known to belong to a parametric family, sayf=f, where θ is a vector of unknown parameters, and_{θ}wis assumed to be non-decreasing. An algorithm for computing the maximum likelihood estimator of(w,θ)is developed, and consistency is established. Simulations are used to show that our method is feasible with moderate sample size, and applications to animal and oil data are given.

Key words and phrases:Animal and oil data, convergence, consistency, EM algorithm, incomplete sample, maximum likelihood estimates, MM algorithm, selection bias, simulations.