Statistica Sinica

Stijn Vansteelandt, Els Goetghebeur, Michael G. Kenward

and Geert Molenberghs

Abstract:It has long been recognised that most standard point estimators lean heavily on untestable assumptions when missing data are encountered. Statisticians have therefore advocated the use of sensitivity analysis, but paid relatively little attention to strategies for summarizing the results from such analyses, summaries that have clear interpretation, verifiable properties and feasible implementation. As a step in this direction, several authors have proposed to shift the focus of inference from point estimators to estimated intervals or regions of ignorance. These regions combine standard point estimates obtained under all possible/plausible missing data models that yield identified parameters of interest. They thus reflect the achievable information from the given data generation structure with its missing data component. The standard framework of inference needs extension to allow for a transparent study of statistical properties of such regions.

In this paper we propose a definition of consistency for a region and introduce the concepts of pointwise, weak and strong coverage for larger regions which acknowledge sampling imprecision in addition to the structural lack of information. The larger regions are called uncertainty regions and quantify an overall level of information by adding imprecision due to sampling error to the estimated region of ignorance. The distinction between ignorance and sampling error is often useful, for instance when sample size considerations are made. The type of coverage required depends on the analysis goal. We provide algorithms for constructing several types of uncertainty regions, and derive general relationships between them. Based on the estimated uncertainty regions, we show how classical hypothesis tests can be performed without untestable assumptions on the missingness mechanism.

Key words and phrases:Bounds, identifiability, incomplete data, inference, pattern-mixture model, selection model.