Statistica Sinica 35 (2025), 1349-1367
Abstract: Statistical analysis in modern scientific research nowadays has opportunities to utilize external summary information from similar studies to gain efficiency. However, the population generating data for current study, referred to as internal population, is typically different from the external population for summary information, although they share some common characteristics that make efficiency improvement possible. The existing population heterogeneity is a challenging issue especially when we have only summary statistics but not individual-level external data. In this paper, we apply an empirical likelihood approach to estimating internal population distribution, with external summary information utilized as constraints for efficiency gain under population heterogeneity. We show that our approach produces an asymptotically more efficient estimator of internal population distribution compared with the customary empirical likelihood without using any external information, under the condition that the external information is based on a dataset with size larger than that of the dataset from internal population. Some simulation results are given to supplement asymptotic theory. A real data example is also illustrated.
Key words and phrases: Constraints, data integration, population heterogeneity, quantile estimation, shared parameters, summary statistics.