Xueying Chen and Min-ge Xie (2014). A split-and-conquer approach for analysis of extraordinarily large data. Vol. 24, No. 4, 1655-1684.

Abstract: If there are datasets, too large to fit into a single computer or too expensive for a computationally intensive data analysis, what should we do? We propose a split-and-conquer approach and illustrate it using several computationally intensive penalized regression methods, along with a theoretical support. We show that the split-and-conquer approach can substantially reduce computing time and computer memory requirements. The proposed methodology is illustrated numerically using both simulation and data examples.

Key words and phrases: Big data, combining results from independent analyses, distributed computing, generalized linear models, large sample theory, penalized regression.