Back To Index Previous Article Next Article Full Text

Statistica Sinica 24 (2014), 1655-1684

A SPLIT-AND-CONQUER APPROACH FOR ANALYSIS
OF EXTRAORDINARILY LARGE DATA
Xueying Chen and Min-ge Xie
Rutgers University

Abstract: If there are datasets, too large to fit into a single computer or too expensive for a computationally intensive data analysis, what should we do? We propose a split-and-conquer approach and illustrate it using several computationally intensive penalized regression methods, along with a theoretical support. We show that the split-and-conquer approach can substantially reduce computing time and computer memory requirements. The proposed methodology is illustrated numerically using both simulation and data examples.

Key words and phrases: Big data, combining results from independent analyses, distributed computing, generalized linear models, large sample theory, penalized regression.

Back To Index Previous Article Next Article Full Text