Statistica Sinica 28 (2018), 2309-2335

SEMI-PARAMETRIC PREDICTION INTERVALS IN

SMALL AREAS WHEN AUXILIARY DATA ARE

MEASURED WITH ERROR

Gauri Datta ^{1}, Aurore Delaigle ^{2}, Peter Hall ^{2} and Li Wang ^{3}

Foreword: Our friend and colleague Peter Hall died in Melbourne, Australia on January 9, 2016, before this manuscript was completed. Peter worked hard on this paper before he fell ill, deriving all the theoretical results of the manuscript, whence our decision to submit this manuscript for the special issue in his honour. His theory is particularly striking since it reveals that the properties of the prediction interval depend on whether or not the contaminated covariate takes the value zero. Peter was very interested by this phenomenon but could not find an intuitive explanation to it. We checked his proofs thoroughly but could not find an intuitive explanation either, except that a similar behaviour is sometimes encountered in other problems.

Abstract: In recent years, demand for reliable small area statistics has considerably increased, but the size of samples obtained in small areas is too often small to produce accurate predictors of quantities of interest. To overcome this difficulty, a common approach is to use auxiliary data from other areas or other sources, and produce estimators that combine them with direct data. A popular model for combining direct and indirect data sources is the Fay-Herriot model, which assumes that the auxiliary variables are observed accurately. However, these variables are often subject to measurement errors, and not taking this into account can lead to estimators that are even worse than those based exclusively on the direct data. We consider structural measurement error models and a semi-parametric approach based on the Fay-Herriot model to produce reliable prediction intervals for small area characteristics of interest. Our theoretical study reveals the surprising fact that the properties of the prediction interval are not the same for all values of the noisy covariate. Indeed, the convergence rates are slower when the contaminated covariate takes the value zero than in other cases. Our procedure is illustrated with an application and simulation studies.

Key words and phrases: Deconvolution, density stimation, Fay-Herriot model, Fourier transform, Laplace distribution.