Back To Index Previous Article Next Article Full Text

Statistica Sinica 34 (2024), 911-932

A PERTURBATION SUBSAMPLING
FOR LARGE SCALE DATA

Yujing Yao and Zhezhen Jin*

Columbia University

Abstract: When analyzing large-scale data, subsampling methods and divide-and-conquer procedures are appealing, because they ease the computational burden, while preserving the validity of inferences. Here, sampling may occur with or without replacement. In this paper, we propose a perturbation subsampling approach based on independent and identically distributed stochastic weights for analyzing large-scale data. We justify the method based on optimizing convex objective functions by establishing the asymptotic consistency and normality of the resulting estimators. This method simultaneously provides consistent point and variance estimators. We demonstrate the finite-sample performance of the proposed method using simulation studies and two real-data analyses.

Key words and phrases: Convex objective function, distributed computing, optimization, perturbation, subsampling.

Back To Index Previous Article Next Article Full Text