Abstract: This article considers semiparametric estimation in logistic regression with missing covariates. In a validation subsample, we assume covariates are measured without error. Some covariates are missing in the non-validation set, while surrogate variables may be available for all study subjects. We consider the case when a covariate variable is missing at random such that the selection probability of the validation set depends only on observed data. Breslow and Cain (1988) proposed a conditional likelihood approach based on the validation set. We combine the conditional likelihoods of the validation set and the non-validation set. The proposed estimator is easy to implement and is semiparametric since no additional model assumption is imposed. Large sample theory is developed. For the estimation of the parameter for the missing covariate, simulations show that, under various situations, the proposed estimator is significantly more efficient than the validation likelihood estimator of Breslow and Cain and the inverse selection probability weighted estimator. Under moderate sample sizes and moderate values of relative risk parameters, our estimator remains competitive when compared with the nonparametric maximum likelihood estimator of Scott and Wild (1997). The proposed method is illustrated by a real data example.
Key words and phrases: Conditional likelihood, errors in variable, logistic regression, two-phase design.