Statistica Sinica
31
(2021), 2153-2177
Lili Wang, Chao Zheng, Wen Zhou and Wen-Xin Zhou Abstract: The robustification parameter, which balances bias and robustness, plays a critical role in the construction of subGaussian estimators for heavy-tailed and/or skewed data. Although the parameter can be tuned using cross-validation, in large-scale statistical problems such as high-dimensional covariance matrix estimation and large-scale multiple testing, the number of robustification parameters increases with the dimensionality causing cross-validation to become computationally prohibitive. We propose a new data-driven principle for choosing the robustification parameter for Huber-type subGaussian estimators in three fundamental problems: mean estimation, linear regression, and sparse regression in high dimensions. Our proposal is guided by a nonasymptotic deviation analysis, and is conceptually different from cross-validation, which relies on the mean squared error to assess the fit. Extensive numerical experiments and a real-data analysis further illustrate the efficacy of the proposed methods. Key words and phrases: Data adaptive, heavy tails, Huber loss, M-estimator, tuning parameters.