Back To Index Previous Article Next Article Full Text

Statistica Sinica 30 (2020), 1741-1771

ESTIMATION AND INFERENCE
FOR VERY LARGE
LINEAR MIXED EFFECTS MODELS

Katelyn Gao and Art B. Owen

Intel Inc. and Stanford University

Abstract: Linear mixed models with large imbalanced crossed random effects structures pose severe computational problems for maximum likelihood estimation and for Bayesian analysis. The costs can grow as fast as when there are N observations. Such problems arise in any setting where the underlying factors satisfy a many-to-many rather than a nested relationship. The former are common in electronic commerce applications, where N can be quite large. Methods that do not account for the correlation structure can greatly underestimate the uncertainty. Thus, we propose a method of moments approach that takes account of the correlation structure and that can be computed at a cost of O( N ). The method of moments can be parallelized easily, because it is based on sums and it does not require parametric distributional assumptions, tuning parameters, or convergence diagnostics. For the regression coefficients, we give conditions for consistency and asymptotic normality, as well as a consistent variance estimate. We also provide the conditions necessary for a consistent estimation of the variance components, as well as consistent estimates of a mildly conservative upper bound on the variance of the variance component estimates. All of these computations require a total processing time of O( N ). We illustrate the algorithm using data from Stitch Fix, where the crossed random effects correspond to clients and items. Here, a naive analysis can overestimate the effective sample size by hundreds and, thus yield unreliable conclusions about the parameters.

Key words and phrases: Crossed random effects, linear mixed models, scalable inference.

Back To Index Previous Article Next Article Full Text