Abstract
A new, very general, robust procedure for combining estimators
in metric spaces is introduced (GROS). The method is reminiscent of
the well-known median of means, as described in Devroye, Lerasle,
Lugosi and Oliveira (2016). Initially, the sample is divided into K
groups. Subsequently, an estimator is computed for each group. Finally, these K estimators are combined using a robust procedure.
We prove that this estimator is sub-Gaussian and we get its breakdown point, in the sense of Donoho. The robust procedure involves
a minimization problem on a general metric space, but we show that
the same (up to a constant) sub-Gaussianity is obtained if the minimization is taken over the sample, making GROS feasible in practice.
The performance of GROS is evaluated through five simulation studies: the first one focuses on classification using k-means, the second
one on the multi-armed bandit problem, the third one on the regression problem. The fourth one is the set estimation problem under
a noisy model. We apply GROS to get a robust persistent diagram.
Lastly, an application of robust estimation techniques to determine
the home-range of Canis dingo in Australia is implemented.
Information
| Preprint No. | SS-2024-0414 |
|---|---|
| Manuscript ID | SS-2024-0414 |
| Complete Authors | Alejandro Cholaquidis, Emilien Joly, Leonardo Moreno |
| Corresponding Authors | Leonardo Moreno |
| Emails | leonardo.moreno@fcea.edu.uy |
References
- Aaron, C., Cholaquidis, A. and Fraiman, R. (2022). Estimation of surface area. Electron. J. Statist. 16(2), 3751–3788.
- Agrawal, R. (1995). Sample mean based index policies by o(log n) regret for the multi-armed bandit problem. Advances in Applied Probability 27(4), 1054–1078.
- Azzalini, A. (2013). The Skew-Normal and Related Families. Institute of Mathematical Statistics Monographs. Cambridge University Press.
- Ba´ıllo, A. and Chac´on, J. E. (2021). Statistical outline of animal home ranges: an application of set estimation. Handbook of Statistics 44, 3–37.
- Biau, G., Fischer, A., Guedj, B. and Malley, J. D. (2016). COBRA: A combined regression strategy. Journal of Multivariate Analysis 146, 18– 28.
- Boente, G., Mart´ınez, A. and Salibi´an-Barrera, M. (2017). Robust estimators for additive models using backfitting. Journal of Nonparametric Statistics 29(4), 744–767.
- Boursier, E. and Perchet, V. (2022). A survey on multi-player bandits. arXiv preprint arXiv:2211.16275.
- Breiman, L. (1996). Stacked regressions. Machine Learning 24, 49–64.
- Breiman, L. (2001). Random forests. Machine Learning 45, 5–32.
- Bubeck, S., Cesa-Bianchi, N. and Lugosi, G. (2013). Bandits with heavy tail. IEEE Transactions on Information Theory 59(11), 7711–7717.
- Burtini, G., Loeppky, J. and Lawrence, R. (2015). A survey of online experiment design with the stochastic multi-armed bandit. arXiv preprint arXiv:1510.00757.
- Burt, W. H. (1943). Territoriality and home range concepts as applied to mammals. Journal of Mammalogy 24(3), 346–352.
- Cholaquidis, A., Fraiman, R., Ghattas, B. and Kalemkerian, J. (2021). A combined strategy for multivariate density estimation. Journal of Nonparametric Statistics 33(1), 39–59.
- Cholaquidis, A., Fraiman, R., Kalemkerian, J. and Llop, P. (2016). A nonlinear aggregation type classifier. Journal of Multivariate Analysis 146, 269–281.
- Cholaquidis, A., Fraiman, R., Mordecki, E. and Papalardo, C. (2021). Level set and drift estimation from a reflected Brownian motion with drift. Statistica Sinica 31, 29–51.
- Cholaquidis, A., Hern´andez, M. and Fraiman, R. (2023). Home range estimation under a restricted sampling scheme. Journal of Nonparametric Statistics, to appear.
- Cuesta-Albertos, J. A., Gordaliza, A. and Matr´an, C. (1997). Trimmed kmeans: an attempt to robustify quantizers. Ann. Statist. 25(2), 553–576.
- Cuevas, A. and Rodr´ıguez-Casal, A. (2004). On boundary estimation. Advances in Applied Probability 36(2), 340–354.
- Devroye, L., Lerasle, M., Lugosi, G. and Oliveira, R. I. (2016). SubGaussian mean estimators. Ann. Statist. 44(6), 2695–2725.
- Donoho, D. L. (1982). Breakdown properties of multivariate location estimators. Technical report, Harvard University, Boston.
- Devroye, L., Gy¨orfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer.
- Edelsbrunner, H. and Harer, J. L. (2022). Computational Topology: An Introduction. American Mathematical Society.
- Fern´andez, C. and Steel, M. F. J. (1998). On Bayesian modeling of fat tails and skewness. Journal of the American Statistical Association 93(441), 359–371.
- Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139.
- Gy¨orfi, L., Kohler, M., Krzy˙zak, A. and Walk, H. (2002). A Distributionfree Theory of Nonparametric Regression. Springer.
- Hartigan, J. A. (1978). Asymptotic distributions for clustering criteria. Ann. Statist. 6, 117–131.
- Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Statist. 35, 73–101.
- James, W. and Stein, C. (1961). Estimation with quadratic loss. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability 1, 361–379.
- Joly, E., Lugosi, G. and Oliveira, R. I. (2017). On the estimation of the mean of a random vector. Electron. J. Statist. 11(1), 440–451.
- Kaufman, L. (1990). Partitioning Around Medoids. In: Finding Groups in Data, 344:68–125.
- Kaufman, L. and Rousseeuw, P. J. (2009). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley.
- Lattimore, T. and Szepesv´ari, C. (2020). Bandit Algorithms. Cambridge University Press.
- Lecu´e, G. and Lerasle, M. (2020). Robust machine learning by median-ofmeans: Theory and practice. Ann. Statist. 48(2), 906–931.
- Lugosi, G. and Mendelson, S. (2019). Mean estimation and regression under heavy-tailed distributions: A survey. Foundations of Computational Mathematics 19(5), 1145–1190.
- Lugosi, G. and Mendelson, S. (2019). Sub-Gaussian estimators of the mean of a random vector. Ann. Statist. 47(2), 783–794.
- Maronna, R. A., Martin, R. D., Yohai, V. J. and Salibi´an-Barrera, M.
- (2019). Robust Statistics: Theory and Methods (with R). Wiley.
- McQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Volume 1, University of California Press, 281–297.
- Nadaraya, E. A. (1964). On estimating regression. Theory of Probability & Its Applications 9(1), 141–142.
- Nemirovsky, A. S. and Yudin, D. B. (1983). Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience.
- Oh, H.-S., Nychka, D. W. and Lee, T. C. M. (2007). The role of pseudo data for robust smoothing with application to wavelet regression. Biometrika 94(4), 893–904.
- Pollard, D. (1981). Strong consistency of k-means clustering. Ann. Probability 9(1), 135–140.
- Pollard, D. (1982). A central limit theorem for k-means clustering. Ann. Probability 10(4), 919–926.
- Rodriguez, D. and Valdora, M. (2019). The breakdown point of the median of means tournament. Statistics & Probability Letters 153, 108–112.
- Rodr´ıguez-Casal, A. (2007). Set estimation under convexity type assumptions. Ann. IHP Probab. Stat. 43(6), 763–774.
- Salibi´an-Barrera, M. (2023). Robust nonparametric regression: Review and practical considerations. Econometrics and Statistics.
- Smith, B. P., Cairns, K. M., Adams, J. W., Newsome, T. M., Fillios, M.,
- Deaux, E. C. et al. (2019). Taxonomic status of the Australian dingo: the case for Canis dingo Meyer, 1793. Zootaxa 4564(1), 173–197.
- Vishwanath, S., Fukumizu, K., Kuriki, S. and Sriperumbudur, B. K. (2020). Robust persistence diagrams using reproducing kernels. Advances in Neural Information Processing Systems 33, 21900–21911.
- Vishwanath, S., Sriperumbudur, B. K., Fukumizu, K. and Kuriki, S. (2022). Robust topological inference in the presence of outliers. arXiv preprint arXiv:2206.01795.
- Watson, G. S. (1964). Smooth regression analysis. Sankhy¯a: The Indian Journal of Statistics, Series A 26(4), 359–372.
- Wolpert, D. H. (1992). Stacked generalization. Neural Networks 5(2), 241– 259.
- Wysong, M. L., Hradsky, B. A., Iacona, G. D., Valentine, L. E., Morris, K.
- and Ritchie, E. G. (2020). Space use and habitat selection of an invasive mesopredator and sympatric, native apex predator. Movement Ecology 8, 1–115. ---