Back To Index Previous Article Next Article Full Text Supplement


Statistica Sinica 20 (2010), 405-421





RELEASING MULTIPLY-IMPUTED SYNTHETIC

DATA GENERATED IN TWO STAGES

TO PROTECT CONFIDENTIALITY


Jerome P. Reiter and Jörg Drechsler


Duke University and Institute for Employment Research


Abstract: To protect the confidentiality of survey respondents' identities and sensitive attributes, statistical agencies can release data in which confidential values are replaced with multiple imputations. These are called synthetic data. We propose a two-stage approach to generating synthetic data that enables agencies to release different numbers of imputations for different variables. Generation in two stages can reduce computational burdens, decrease disclosure risk, and increase inferential accuracy relative to generation in one stage. We present methods for obtaining inferences from such data. We describe the application of two stage synthesis to creating a public use file for a German business database.



Key words and phrases: Confidentiality, disclosure, multiple imputation, synthetic data.

Back To Index Previous Article Next Article Full Text Supplement