Abstract
This paper investigates the asymptotic properties of algorithms that
can be viewed as robust analogues of the classical empirical risk minimization.
These strategies are based on replacing the usual empirical average with a robust
proxy of the mean, such as a variant of the median-of-means estimator. It is
well known by now that the excess risk of resulting estimators often converges
to zero at optimal rates under much weaker assumptions than those required
by their classical counterparts.
However, less is known about the asymptotic
properties of the estimators themselves, for instance, whether robust analogues
of the maximum likelihood estimators are asymptotically efficient. We take a
step towards answering these questions and show that for a wide class of parametric problems, minimizers of the appropriately defined robust proxy of the risk
converge to the minimizers of the true risk at the same rate, and often have the
same asymptotic variance, as the estimators obtained by minimizing the usual
empirical risk. Finally, we discuss the computational aspects of the problem and
demonstrate the numerical performance of the methods under consideration in
numerical experiments.
Information
| Preprint No. | SS-2024-0268 |
|---|---|
| Manuscript ID | SS-2024-0268 |
| Complete Authors | Stanislav Minsker |
| Corresponding Authors | Stanislav Minsker |
| Emails | stas.minsker@gmail.com |
References
- Alistarh, D., Z. Allen-Zhu, and J. Li (2018). Byzantine stochastic gradient descent. In Advances in Neural Information Processing Systems, pp. 4613–4623.
- Alon, N., Y. Matias, and M. Szegedy (1996). The space complexity of approximating the frequency moments. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pp. 20–29. ACM.
- Audibert, J.-Y., O. Catoni, et al. (2011). Robust linear least squares regression. The Annals of Statistics 39(5), 2766–2794.
- Brownlees, C., E. Joly, G. Lugosi, et al. (2015). Empirical risk minimization for heavy-tailed losses. The Annals of Statistics 43(6), 2507–2536.
- Catoni, O. (2012). Challenging the empirical mean and empirical variance: a deviation study. In Annales de l’Institut Henri Poincar´e, Probabilit´es et Statistiques, Volume 48, pp. 1148– 1185. Institut Henri Poincar´e.
- Chen, Y., L. Su, and J. Xu (2017). Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. Proceedings of the ACM on Measurement and Analysis of Computing Systems 1(2), 1–25.
- Cherapanamjeri, Y., S. B. Hopkins, T. Kathuria, P. Raghavendra, and N. Tripuraneni (2019). Algorithms for heavy-tailed statistics: Regression, covariance estimation, and beyond. arXiv preprint arXiv:1912.11071.
- Devroye, L., M. Lerasle, G. Lugosi, and R. I. Oliveira (2016). Sub-Gaussian mean estimators. The Annals of Statistics 44(6), 2695–2725.
- Feller, W. (1968). On the Berry-Esseen theorem. Zeitschrift f¨ur Wahrscheinlichkeitstheorie und Verwandte Gebiete 10(3), 261–268.
- Holland, M. J. and K. Ikeda (2017). Robust regression using biased objectives. Machine Learning 106(9-10), 1643–1679.
- Huber, P. J. (1964). Robust estimation of a location parameter. The Annals of Mathematical Statistics 35(1), 73–101.
- Ibragimov, R. and S. Sharakhmetov (2001). The best constant in the Rosenthal inequality for nonnegative random variables. Statistics & probability letters 55(4), 367–376.
- Lecu´e, G. and M. Lerasle (2020). Robust machine learning by median-of-means: theory and practice. The Annals of Statistics 48(2), 906–931.
- Lecu´e, G., M. Lerasle, and T. Mathieu (2020). Robust classification via MOM minimization. Machine learning 109, 1635–1665.
- Ledoux, M. and M. Talagrand (1991). Probability in Banach Spaces: isoperimetry and processes. Berlin: Springer-Verlag.
- Lerasle, M. and R. I. Oliveira (2011). Robust empirical mean estimators. arXiv preprint arXiv:1112.3914.
- Li, K., H. Bao, and L. Zhang (2022). Robust covariance estimation for distributed principal component analysis. Metrika 85(6), 707–732.
- Lugosi, G. and S. Mendelson (2019a).
- Mean estimation and regression under heavy-tailed distributions: A survey. Foundations of Computational Mathematics 19(5), 1145–1190.
- Lugosi, G. and S. Mendelson (2019b). Risk minimization by median-of-means tournaments.
- Journal of the European Mathematical Society 22(3), 925–965.
- Mathieu, T. and S. Minsker (2021). Excess risk bounds in robust empirical risk minimization. Information and Inference: A Journal of the IMA 10(4), 1423–1490.
- Minsker, S. (2019). Distributed statistical estimation and rates of convergence in normal approximation. Electronic Journal of Statistics 13(2), 5213–5252.
- Minsker, S. (2025). Uniform bounds for robust mean estimators. To appear in Stochastic Processes and Their Applications.
- Minsker, S. and S. Yao (2025). Generalized median of means principle for Bayesian inference. Machine Learning 114(4).
- Nemirovski, A. and D. Yudin (1983). Problem complexity and method efficiency in optimization. John Wiley & Sons Inc.
- O’Donnell, R. (2014). Analysis of boolean functions. Cambridge University Press.
- Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,
- P. Prettenhofer, R. Weiss, V. Dubourg, et al. (2011). Scikit-learn: Machine learning in python. Journal of machine learning research 12(Oct), 2825–2830.
- Prasad, A., A. S. Suggala, S. Balakrishnan, P. Ravikumar, et al. (2020). Robust estimation via robust gradient estimation. Journal of the Royal Statistical Society Series B 82(3), 601–627.
- Talagrand, M. (2005). The generic chaining. Springer.
- van der Vaart, A. W. (2000). Asymptotic statistics, Volume 3. Cambridge University Press.
- van der Vaart, A. W. and J. A. Wellner (1996). Weak convergence and empirical processes. Springer Series in Statistics. New York: Springer-Verlag.
Acknowledgments
This research was partially supported by the National Science Foundation
under grants DMS CAREER-2045068 and CCF-1908905. The author would
like to thank Timoth´ee Mathieu for sharing his code used in the simulation
risk-bounds-in-robust-empirical-risk-minimization/).
Supplementary Materials
The online supplementary material includes the proof of Theorem 1, the
proofs of technical results and description of numerical simulation.