Back To Index Previous Article Next Article Full Text

Statistica Sinica 33 (2023), 303-329

MODEL CHECKING IN LARGE-SCALE DATA SET VIA
STRUCTURE-ADAPTIVE-SAMPLING

Yixin Han1, Ping Ma2, Haojie Ren3 and Zhaojun Wang1

1Nankai University, 2University of Georgia and 3Shanghai Jiao Tong University

Abstract: Lack-of-fit testing is often essential in many statistical/machine earning applications. Despite the availability of large-scale data sets, the challenges associated with model checking when some resource budgets are limited are not yet well addressed. In this paper, we propose a design-adaptive testing procedure for checking a general model when only a limited number of data observations are available. We derive an optimal sampling strategy, called Structure-Adaptive-Sampling , to select a small subset from a large pool of data. With this subset, the proposed test possesses the asymptotically best power. Numerical results on both synthetic and real-world data confirm the effectiveness of the proposed method.

Key words and phrases: Dimension reduction, kernel smoothing, large-scale data set, nonparametric lack-of-fit tests, optimal sampling, semiparametric modelling.

Back To Index Previous Article Next Article Full Text