Roderick J. Little (2020). ON ALGORITHMIC AND MODELING APPROACHES TO IMPUTATION IN LARGE DATA SETS. Vol 30 No. 4, 1685-1696.

Abstract: The machine learning and statistical modeling cultures provide contrasting approaches to statistical analysis. In an article in this journal, Loh, Eltinge, Cho and Li compare these approaches in the setting of imputation of large data sets, recommending machine-learning methods. All the compared methods make assumptions, and I note that these assumptions receive more critical assessment for the model-based approaches than for the tree-based machine-learning methods. I discuss in particular the assumptions about the missing-data mechanism implied by the differing approaches. I question the extent to which general conclusions can be drawn from their simulation study, given the relatively strong performance of the method that discards the incomplete cases, and the limited exploration of the relevant design space.

Key words and phrases: Imputation, missing data, machine learning, nonresponse weighting, tree and forest methods.