Statistica Sinica 34 (2024), 1675-1697
Abstract: With advancements in data collection and storage technology, data analysis in modern scientific research and practice has shifted from analyzing single data sets to coupling several data sets. Here, we consider a nonparametric kernel regression in an internal data set analysis, using constraints for auxiliary information from an external data set with summary statistics. Under several conditions, we show that the proposed constrained kernel regression estimator is asymptotically normal, and outperforms the standard kernel regression without external information in terms of the asymptotic mean integrated square error. Furthermore, we consider the situation in which the internal and external data have different populations. Simulation results confirm our theory and quantify the improvements from using external data. Lastly, we demonstrate the proposed method using a real-data example.
Key words and phrases: Asymptotic mean integrated square error, constraints, data integration, external summary statistics, two-step kernel regression.