Abstract: Multistage sampling of family data is a common design in the field of genetic epidemiology, but appropriate methodologies for analyzing data collected under this design are still lacking. We propose here a statistical approach based on the composite likelihood framework. The composite likelihood is a weighted product of individual likelihoods corresponding to the sampling strata, where the weights are the inverse sampling probabilities of the families in each stratum. Our approach is developed for time-to-event data and can handle missing genetic covariates by using an Expectation-Maximization algorithm. A robust variance estimator is employed to account for the dependence of individuals within families. Our simulation studies have demonstrated the good properties of our approach in terms of consistency and efficiency of the genetic relative risk estimate in the presence of missing genotypes and under different multistage sampling designs. Finally, an application to a familial study of early-onset breast cancer shows the interest of our approach. While it confirms the important effect of the genes BRCA1 and BRCA2 in these families, it also shows that incorrect inference can be made about this effect if the sampling design is not properly taken into account.
Key words and phrases: Composite likelihood, EM algorithm, family data, missing genotype, multistage sampling.