Back To Index Previous Article Next Article Full Text

Statistica Sinica 31 (2021), 1935-1959

PARTITIONED APPROACH FOR HIGH-DIMENSIONAL
CONFIDENCE INTERVALS WITH LARGE SPLIT SIZES

Zemin Zheng, Jiarui Zhang, Yang Li and Yaohua Wu

University of Science and Technology of China

Abstract: With the availability of massive data sets, accurate inferences with low computational costs are the key to improving scalability. When the sample size and dimensionality are both large, naively applying de-biasing to derive confidence intervals can be computationally inefficient or infeasible, because the de-biasing procedure increases the computational cost by an order of magnitude compared with that of the initial penalized estimation. Therefore, we suggest a split and conquer approach to improve the scalability of the de-biasing procedure, and show that the length of the established confidence interval is asymptotically the same as that using all of the data. Moreover, we demonstrate a significant improvement in the largest split size by separating the initial estimation and the relaxed projection steps, indicating that the sample sizes needed for these two steps with statistical guarantees are different. We propose a refined inference procedure to address the inflation issue in the finite sample performance when the split size becomes large. Lastly, numerical studies demonstrate the computational advantage and theoretical guarantee of our new methodology.

Key words and phrases: Big data, confidence intervals, de-biased estimator, divide and conquer, large split sizes, scalability.

Back To Index Previous Article Next Article Full Text