Statistica Sinica

Wei Pan and John E. Connett

Abstract:The generalized estimating equation (GEE) approach is becoming more and more popular in handling correlated response data, for example in longitudinal studies. An attractive property of the GEE is that one can use someworkingcorrelation structure that may be wrong, but the resulting regression coefficient estimate is still consistent and asymptotically normal. One convenient choice is the independence model: treat the correlated responses as if they were independent. However with time-varying covariates there is a dilemma: using the independence model may be very inefficient (Fitzmaurice (1995)); using a non-diagonal working correlation matrix may violate an important assumption in GEE, producing biased estimates (Pepe and Anderson (1994)). It would be desirable to be able to distinguish these two situations based on the data at hand. More generally, selecting an appropriate working correlation structure, as an aspect of model selection, may improve estimation efficiency. In this paper we propose some resampling-based methods (i.e., the bootstrap and cross-validation) to do this. The methodology is demonstrated by application to the Lung Health Study (LHS) data to investigate the effects of smoking cessation on lung function and on the symptom of chronic cough. In addition, Pepe and Anderson's result is verified using the LHS data.

Key words and phrases:Bootstrap, cross-validation, GEE, GLM, model selection, PMSE.