Abstract: Correlated data, such as multivariate or clustered data, arise commonly in practice. Unlike analysis for independent data, valid inference based on such data often requires proper accommodation of complex association structures among response components within clusters. Semiparametric models based on generalized estimating equations (GEE) methods, and their extensions, have become increasingly popular. However, these inferential schemes are greatly challenged by the complexity of such data features as missing observations, ubiquitous in applications. Moreover, existing methods mainly concern marginal mean parameters with association parameters treated as nuisance. This treatment is inadequate to handle clustered data for which estimation of association parameters can be a central theme of the study. To address these problems, we develop a flexible semiparametric method that can handle correlated data with or without missing values. Our discussion focuses on binary data that arise commonly. The proposed method enjoys a number of attractive properties, including that the missing data process is left unmodeled, yet model assumptions for the response process are kept to a minimum. It is robust in the sense that only the mean and association structures for the response process are modeled. The proposed method is flexible because both parametric and nonparametric structures are incorporated in modeling the mean responses.
Key words and phrases: Association, binary outcomes, correlated data, generalized partially linear models, missing data, pairwise likelihood, semiparametric method, single-index models.