Back To Index Previous Article Next Article Full Text Supplement

Statistica Sinica 20 (2010), 149-165





VARIABLE SELECTION FOR REGRESSION MODELS

WITH MISSING DATA


Ramon I. Garcia, Joseph G. Ibrahim and Hongtu Zhu


University of North Carolina at Chapel Hill


Abstract: We consider the variable selection problem for a class of statistical models with missing data, including missing covariate and/or response data. We investigate the smoothly clipped absolute deviation penalty (SCAD) and adaptive LASSO and propose a unified model selection and estimation procedure for use in the presence of missing data. We develop a computationally attractive algorithm for simultaneously optimizing the penalized likelihood function and estimating the penalty parameters. Particularly, we propose to use a model selection criterion, called the IC$_Q$ statistic, for selecting the penalty parameters. We show that the variable selection procedure based on IC$_Q$ automatically and consistently selects the important covariates and leads to efficient estimates with oracle properties. The methodology is very general and can be applied to numerous situations involving missing data, from covariates missing at random in arbitrary regression models to nonignorably missing longitudinal responses and/or covariates. Simulations are given to demonstrate the methodology and examine the finite sample performance of the variable selection procedures. Melanoma data from a cancer clinical trial is presented to illustrate the proposed methodology.



Key words and phrases: EM algorithm, IC$_Q$, missing data, penalized likelihood, variable selection.

Back To Index Previous Article Next Article Full Text Supplement