Statistica Sinica

Michelle Liou

Abstract:In educational testing contexts, the relative comparability of scores on two tests is commonly established using the equipercentile method, which equates scores based on the corresponding percentile ranks in test score distributions. Because of security or disclosure considerations, data collection for a comparability study is often conducted using an incomplete-data design, that is, the two tests are given to two non-random groups at slightly different time points, and a set of common items is included in the test administration to allow some statistical adjustments for possible sample-selection bias. In the literature, researchers have made the missing-at-random assumption when estimating population score distributions using the common-item scores. This assumption can be violated in various ways, especially when the groups differ in ages or when the tests are administered a few months apart. In this study a general model is proposed for estimating score distributions using incomplete data; the model considers background information (e.g., gender, ethnicity) together with common-item scores as possible predictors of sample-selection bias, and allows nonresponse to depend on missing scores. The model parameters are estimated using the maximum-likelihood method and a Bayesian procedure. The standard errors of comparable scores are also derived under the proposed model. The use of the model is illustrated in two applications.

Key words and phrases:Bayesian methods, categorical data, data-imputation, equipercentile equating, EM algorithm, log-linear smoothing.