Back To Index Previous Article Next Article Full Text

Statistica Sinica 35 (2025), 539-561

ONE STEP TO EFFICIENT SYNTHETIC DATA

Jordan Awan* and Zhanrui Cai

Purdue University and The University of Hong Kong

Abstract: A common approach to synthetic data is to sample from a fitted model. We show that under general assumptions, this approach results in a sample with inefficient estimators, and the joint distribution of the sample is inconsistent with the true distribution. Motivated by this, we propose a general method of producing synthetic data that is widely applicable for parametric models, has asymptotically efficient summary statistics, and is easily implemented and highly computationally efficient. Our approach allows for the construction of both partially synthetic datasets, which preserve certain summary statistics, as well as fully synthetic data, which satisfy differential privacy. In the case of continuous random variables, we prove that our method preserves the efficient estimator with asymptotically negligible error and show through simulations that this property holds for discrete distributions as well. We also provide theoretical and empirical evidence that the distribution from our procedure converges to the true distribution. Besides our focus on synthetic data, our procedure can also be used to perform hypothesis tests in the presence of intractable likelihood functions.

Key words and phrases: Differential privacy, indirect inference, parametric bootstrap, simulation-based inference, statistical disclosure control.

Back To Index Previous Article Next Article Full Text