Statistica Sinica 24 (2014), 1655-1684
Abstract: If there are datasets, too large to fit into a single computer or too expensive for a computationally intensive data analysis, what should we do? We propose a split-and-conquer approach and illustrate it using several computationally intensive penalized regression methods, along with a theoretical support. We show that the split-and-conquer approach can substantially reduce computing time and computer memory requirements. The proposed methodology is illustrated numerically using both simulation and data examples.
Key words and phrases: Big data, combining results from independent analyses, distributed computing, generalized linear models, large sample theory, penalized regression.