Lisa Hermans, Vahid Nassiri, Geert Molenberghs,Michael G. Kenward, Wim Van der Elst, Marc Aerts and Geert Verbeke (2018). CLUSTERS WITH UNEQUAL SIZE: MAXIMUM LIKELIHOOD VERSUS WEIGHTED ESTIMATION IN LARGE SAMPLES. Vol 28 No. 3, 1107-1132.

Abstract: The analysis of hierarchical data that take the form of clusters with random size has received considerable attention. The focus here is on samples that are very large in terms of number of clusters and/or members per cluster, on the one hand, as well as on very small samples (e.g., when studying rare diseases), on the other. Whereas maximum likelihood inference is straightforward in medium to large samples, in samples of sizes considered here it may be prohibitive. We propose sample-splitting (Molenberghs, Verbeke and Iddi (2011)) as a way to replace iterative optimization of a likelihood that does not admit an analytical solution, with closed-form calculations. We use pseudo-likelihood (Molenberghs et al. (2014)), consisting of computing weighted averages over solutions obtained for each cluster size occurring. As a result, the statistical properties of this approach need to be investigated, especially because the minimal sufficient statistics involved are incomplete. The operational characteristics were studied using simulations. Simulations were also done to compare the proposed method to existing techniques developed to circumvent difficulties with unequal cluster sizes, such as multiple imputation. It follows that the proposed non-iterative methods have a strong beneficial impact on computation time; at the same time, the method is the most precise among its competitors considered. The findings are illustrated using data from a developmental toxicity study, where clusters are formed of fetuses within litters.

Key words and phrases: Likelihood inference, pseudo-likelihood, unequal cluster size.