Robust Max Statistics for High-dimensional Inference

Mingshuo Liu and Miles Lopes

doi:10.5705/ss.202025.0036

Abstract

Although much progress has been made in the theory and application of bootstrap approximations

for max statistics in high dimensions, the literature has largely been restricted to cases involving

light-tailed data. To address this issue, we propose an approach to inference based on robust max

statistics, and we show that their distributions can be accurately approximated via bootstrapping

when the data are both high-dimensional and heavy-tailed. In particular, the data are assumed to

satisfy an extended version of the well-established L4-L2 moment equivalence condition, as well as

a weak variance decay condition. In this setting, we show that near-parametric rates of bootstrap

approximation can be achieved in the Kolmogorov metric, independently of the data dimension.

Moreover, this theoretical result is complemented by encouraging empirical results involving both

Euclidean and functional data.

Key words and phrases: high-dimensional statistics; robustness; bootstrap; simultaneous inference; median-of-means 1

Information

Preprint No.	SS-2025-0036
Manuscript ID	SS-2025-0036
Complete Authors	Mingshuo Liu, Miles Lopes
Corresponding Authors	Miles Lopes
Emails	melopes@ucdavis.edu

References

Abdalla, P. and N. Zhivotovskiy (2022). Covariance estimation: Optimal dimension-free guarantees for adversarial corruption and heavy tails. arXiv:2205.08494.
Alpha Vantage (2024). Alpha Vantage API. https://www.alphavantage.co.
Bai, Z. and J. W. Silverstein (2010). Spectral Analysis of Large Dimensional Random Matrices. Springer.
Chang, J., Q.-M. Shao, and W.-X. Zhou (2016). Cram´er-type moderate deviations for studentized two-sample U-statistics with applications.
Chen, X. and K. Kato (2020). Jackknife multiplier bootstrap: finite sample approximations to the U-process supremum with applications. Probability Theory and Related Fields 176, 1097–1163.
Chen, Y.-C., C. R. Genovese, and L. Wasserman (2015). Asymptotic theory for density ridges. The Annals of Statistics 43(5), 1896–1928.
Chernozhukov, V., D. Chetverikov, and K. Kato (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. The Annals of Statistics 41(6), 2786–2819.
Chernozhukov, V., D. Chetverikov, and K. Kato (2014). Anti-concentration and honest, adaptive confidence bands. The Annals of Statistics 42(5), 1787–1818.
Chernozhukov, V., D. Chetverikov, and K. Kato (2017). Central limit theorems and bootstrap in high dimensions. The Annals of Probability 45(4), 2309–2352.
Chernozhukov, V., D. Chetverikov, and Y. Koike (2023). Nearly optimal central limit theorem and bootstrap i i i hi h di i Th A l f A li d P b bili 33(3) 2374 2425
Chetverikov, D., A. Santos, and A. M. Shaikh (2018). The econometrics of shape restrictions. Annual Review of Economics 10, 31–63.
Comon, P. and C. Jutten (2010). Handbook of Blind Source Separation: Independent Component Analysis and Applications. Academic Press.
de la Pe˜na, V. H., T. L. Lai, and Q.-M. Shao (2009). Self-normalized processes: Limit theory and Statistical Applications. Springer.
Delaigle, A., P. Hall, and J. Jin (2011). Robustness and accuracy of methods for high dimensional data analysis based on Student’s t-statistic. Journal of the Royal Statistical Society Series B: Statistical Methodology 73(3), 283–301.
Deng, H. and C.-H. Zhang (2020). Beyond Gaussian approximation: Bootstrap for maxima of sums of independent random vectors. The Annals of Statistics 48(6), 3643–3671.
Dette, H., K. Kokot, and A. Aue (2020). Functional data analysis in the Banach space of continuous functions. The Annals of Statistics 48(2), 1168–1192.
Fan, J., P. Hall, and Q. Yao (2007). To how many simultaneous hypothesis tests can normal, Student’s t or bootstrap calibration be applied? Journal of the American Statistical Association 102(480), 1282–1288.
Fan, J., Z. Lou, and M. Yu (2023). Robust high-dimensional tuning free multiple testing. The Annals of Statistics 51(5), 2093–2115.
Fang, X., Y. Koike, S.-H. Liu, and Y.-K. Zhao (2023). High-dimensional central limit theorems by Stein’s method in the degenerate case. arXiv:2305.17365.
Foss, S., D. Korshunov, and S. Zachary (2011). An Introduction to Heavy-tailed and Subexponential Distributions. Springer. Giessing A
(2023) Gaussian and bootstrap approximations for suprema of empirical processes arXiv:2309.01307.
Giessing, A. and J. Fan (2020). Bootstrapping ℓp-statistics in high dimensions. arXiv:2006.13099.
Han, F., S. Xu, and W.-X. Zhou (2018). On Gaussian comparison inequality and its application to spectral analysis of large random matrices. Bernoulli 24(3), 1787–1833.
Hodges, J. and E. Lehmann (1963). Estimates of location based on rank tests. The Annals of Mathematical Statistics 34(2), 598–611.
Johnstone, I. M. (2019+). Gaussian Estimation: Sequence and Wavelet Models. preprint.
Ke, Y., S. Minsker, Z. Ren, Q. Sun, and W.-X. Zhou (2019). User-friendly covariance estimation for heavy-tailed distributions. Statistical Science 34(3), 454–471.
Kock, A. B. and D. Preinerstorfer (2024). A remark on moment-dependent phase transitions in high-dimensional Gaussian approximations. Statistics & Probability Letters 211, 110149.
Kock, A. B. and D. Preinerstorfer (2025). High-dimensional Gaussian and bootstrap approximations for robust means. arXiv:2504.08435.
Koike, Y. (2024). High-dimensional bootstrap and asymptotic expansion. arXiv:2404.05006.
Kotz, S., N. Balakrishnan, and N. L. Johnson (2019). Continuous Multivariate Distributions, Volume 1: Models and Applications, Volume 334. John Wiley & Sons.
Kuchibhotla, A. K., L. D. Brown, A. Buja, J. Cai, E. I. George, and L. H. Zhao (2020). Valid post-selection inference in model-free linear regression. The Annals of Statistics 48(5), 2953–2981.
Kuchibhotla, A. K., S. Mukherjee, and D. Banerjee (2021). High-dimensional CLT: Improvements, non-uniform extensions and large deviations. Bernoulli 27(1), 192 – 217.
Kuchibhotla, A. K. and A. Rinaldo (2020). High-dimensional CLT for sums of non-degenerate random vectors: Xi 2009 13673
Liu, M. and M. E. Lopes (2024). Robust max statistics for high-dimensional inference. arXiv:2409.16683.
Liu, W. and Q.-M. Shao (2014). Phase transition and regularized bootstrap in large-scale t-tests with false discovery rate control. The Annals of Statistics 42(5), 2003 – 2025.
Lopes, M. E. (2022). Central limit theorem and bootstrap approximation in high dimensions: Near 1/√n rates via implicit smoothing. The Annals of Statistics 50(5), 2492–2513.
Lopes, M. E., N. B. Erichson, and M. W. Mahoney (2023). Bootstrapping the operator norm in high dimensions: Error estimation for covariance matrices and sketching. Bernoulli 29(1), 428–450.
Lopes, M. E., Z. Lin, and H.-G. M¨uller (2020). Bootstrapping max statistics in high dimensions: Near-parametric rates under weak variance decay and application to functional and multinomial data. The Annals of Statistics 48(2), 1214–1229.
Lou, Z. and W. B. Wu (2017). Simultaneous inference for high dimensional mean vectors. arXiv:1704.04806.
Lugosi, G. and S. Mendelson (2019). Mean estimation and regression under heavy-tailed distributions: A survey. Foundations of Computational Mathematics 19(5), 1145–1190.
Mendelson, S. and N. Zhivotovskiy (2020). Robust covariance estimation under L4-L2 norm equivalence. The Annals of Statistics 48(3), 1648–1664.
Muirhead, R. J. (2009). Aspects of Multivariate Statistical Theory. John Wiley & Sons.
Nair, J., A. Wierman, and B. Zwart (2022). The Fundamentals of Heavy Tails: Properties, Emergence, and Estimation. Cambridge.
Nemirovsky, A. S. and D. B. Yudin (1983). Problem Complexity and Method Efficiency in Optimization. Wiley.
Oksendal, B. (2013). Stochastic Differential Equations: An Introduction with Applications. Springer.
P´olya, G. (1949). Remarks on characteristic functions. In Proceedings of the First Berkeley Symposium on M h i l S i i d P b bili 115 123
Resende, L. (2024). Robust high-dimensional Gaussian and bootstrap approximations for trimmed sample means. arXiv:2410.22085.
Roy, A., K. Balasubramanian, and M. A. Erdogdu (2021). On empirical risk minimization with dependent and heavy-tailed data. Advances in Neural Information Processing Systems 34, 8913–8926.
Ruppert, D. and D. S. Matteson (2011). Statistics and Data Analysis for Financial Engineering with R Examples. Springer.
Singh, R. and S. Vijaykumar (2023). Kernel ridge regression inference. arXiv:2302.06578.
Sun, Y., X. He, and J. Hu (2022). An omnibus test for detection of subgroup treatment effects via data partitioning. Annals of Applied Statistics 16(4), 2266–2278.
Vidyamurthy, G. (2004). Pairs Trading: Quantitative Methods and Analysis. John Wiley & Sons.
Yu, M. and X. Chen (2021). Finite sample change point inference and identification for high-dimensional mean vectors. Journal of the Royal Statistical Society Series B: Statistical Methodology 83(2), 247–270.
Zhang, D. and W. B. Wu (2017). Gaussian approximation for high dimensional time series. The Annals of Statistics 45(5), 1895–1919.

Acknowledgments

We are grateful to Mengxin Yu for generously providing the code for the HL method.

Supplementary Materials

The supplementary materials contain the proofs of Theorem 1 and Proposition 1.

Supplementary materials are available for download.

[1] Abdalla, P. and N. Zhivotovskiy (2022). Covariance estimation: Optimal dimension-free guarantees for adversarial corruption and heavy tails. arXiv:2205.08494.

[2] Alpha Vantage (2024). Alpha Vantage API. https://www.alphavantage.co.

[3] Bai, Z. and J. W. Silverstein (2010). Spectral Analysis of Large Dimensional Random Matrices. Springer.

[4] Chang, J., Q.-M. Shao, and W.-X. Zhou (2016). Cram´er-type moderate deviations for studentized two-sample U-statistics with applications.

[5] Chen, X. and K. Kato (2020). Jackknife multiplier bootstrap: finite sample approximations to the U-process supremum with applications. Probability Theory and Related Fields 176, 1097–1163.

[6] Chen, Y.-C., C. R. Genovese, and L. Wasserman (2015). Asymptotic theory for density ridges. The Annals of Statistics 43(5), 1896–1928.

[7] Chernozhukov, V., D. Chetverikov, and K. Kato (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. The Annals of Statistics 41(6), 2786–2819.

[8] Chernozhukov, V., D. Chetverikov, and K. Kato (2014). Anti-concentration and honest, adaptive confidence bands. The Annals of Statistics 42(5), 1787–1818.

[9] Chernozhukov, V., D. Chetverikov, and K. Kato (2017). Central limit theorems and bootstrap in high dimensions. The Annals of Probability 45(4), 2309–2352.

[10] Chernozhukov, V., D. Chetverikov, and Y. Koike (2023). Nearly optimal central limit theorem and bootstrap i i i hi h di i Th A l f A li d P b bili 33(3) 2374 2425

[11] Chetverikov, D., A. Santos, and A. M. Shaikh (2018). The econometrics of shape restrictions. Annual Review of Economics 10, 31–63.

[12] Comon, P. and C. Jutten (2010). Handbook of Blind Source Separation: Independent Component Analysis and Applications. Academic Press.

[13] de la Pe˜na, V. H., T. L. Lai, and Q.-M. Shao (2009). Self-normalized processes: Limit theory and Statistical Applications. Springer.

[14] Delaigle, A., P. Hall, and J. Jin (2011). Robustness and accuracy of methods for high dimensional data analysis based on Student’s t-statistic. Journal of the Royal Statistical Society Series B: Statistical Methodology 73(3), 283–301.

[15] Deng, H. and C.-H. Zhang (2020). Beyond Gaussian approximation: Bootstrap for maxima of sums of independent random vectors. The Annals of Statistics 48(6), 3643–3671.

[16] Dette, H., K. Kokot, and A. Aue (2020). Functional data analysis in the Banach space of continuous functions. The Annals of Statistics 48(2), 1168–1192.

[17] Fan, J., P. Hall, and Q. Yao (2007). To how many simultaneous hypothesis tests can normal, Student’s t or bootstrap calibration be applied? Journal of the American Statistical Association 102(480), 1282–1288.

[18] Fan, J., Z. Lou, and M. Yu (2023). Robust high-dimensional tuning free multiple testing. The Annals of Statistics 51(5), 2093–2115.

[19] Fang, X., Y. Koike, S.-H. Liu, and Y.-K. Zhao (2023). High-dimensional central limit theorems by Stein’s method in the degenerate case. arXiv:2305.17365.

[20] Foss, S., D. Korshunov, and S. Zachary (2011). An Introduction to Heavy-tailed and Subexponential Distributions. Springer. Giessing A

[21] (2023) Gaussian and bootstrap approximations for suprema of empirical processes arXiv:2309.01307.

[22] Giessing, A. and J. Fan (2020). Bootstrapping ℓp-statistics in high dimensions. arXiv:2006.13099.

[23] Han, F., S. Xu, and W.-X. Zhou (2018). On Gaussian comparison inequality and its application to spectral analysis of large random matrices. Bernoulli 24(3), 1787–1833.

[24] Hodges, J. and E. Lehmann (1963). Estimates of location based on rank tests. The Annals of Mathematical Statistics 34(2), 598–611.

[25] Johnstone, I. M. (2019+). Gaussian Estimation: Sequence and Wavelet Models. preprint.

[26] Ke, Y., S. Minsker, Z. Ren, Q. Sun, and W.-X. Zhou (2019). User-friendly covariance estimation for heavy-tailed distributions. Statistical Science 34(3), 454–471.

[27] Kock, A. B. and D. Preinerstorfer (2024). A remark on moment-dependent phase transitions in high-dimensional Gaussian approximations. Statistics & Probability Letters 211, 110149.

[28] Kock, A. B. and D. Preinerstorfer (2025). High-dimensional Gaussian and bootstrap approximations for robust means. arXiv:2504.08435.

[29] Koike, Y. (2024). High-dimensional bootstrap and asymptotic expansion. arXiv:2404.05006.

[30] Kotz, S., N. Balakrishnan, and N. L. Johnson (2019). Continuous Multivariate Distributions, Volume 1: Models and Applications, Volume 334. John Wiley & Sons.

[31] Kuchibhotla, A. K., L. D. Brown, A. Buja, J. Cai, E. I. George, and L. H. Zhao (2020). Valid post-selection inference in model-free linear regression. The Annals of Statistics 48(5), 2953–2981.

[32] Kuchibhotla, A. K., S. Mukherjee, and D. Banerjee (2021). High-dimensional CLT: Improvements, non-uniform extensions and large deviations. Bernoulli 27(1), 192 – 217.

[33] Kuchibhotla, A. K. and A. Rinaldo (2020). High-dimensional CLT for sums of non-degenerate random vectors: Xi 2009 13673

[34] Liu, M. and M. E. Lopes (2024). Robust max statistics for high-dimensional inference. arXiv:2409.16683.

[35] Liu, W. and Q.-M. Shao (2014). Phase transition and regularized bootstrap in large-scale t-tests with false discovery rate control. The Annals of Statistics 42(5), 2003 – 2025.

[36] Lopes, M. E. (2022). Central limit theorem and bootstrap approximation in high dimensions: Near 1/√n rates via implicit smoothing. The Annals of Statistics 50(5), 2492–2513.

[37] Lopes, M. E., N. B. Erichson, and M. W. Mahoney (2023). Bootstrapping the operator norm in high dimensions: Error estimation for covariance matrices and sketching. Bernoulli 29(1), 428–450.

[38] Lopes, M. E., Z. Lin, and H.-G. M¨uller (2020). Bootstrapping max statistics in high dimensions: Near-parametric rates under weak variance decay and application to functional and multinomial data. The Annals of Statistics 48(2), 1214–1229.

[39] Lou, Z. and W. B. Wu (2017). Simultaneous inference for high dimensional mean vectors. arXiv:1704.04806.

[40] Lugosi, G. and S. Mendelson (2019). Mean estimation and regression under heavy-tailed distributions: A survey. Foundations of Computational Mathematics 19(5), 1145–1190.

[41] Mendelson, S. and N. Zhivotovskiy (2020). Robust covariance estimation under L4-L2 norm equivalence. The Annals of Statistics 48(3), 1648–1664.

[42] Muirhead, R. J. (2009). Aspects of Multivariate Statistical Theory. John Wiley & Sons.

[43] Nair, J., A. Wierman, and B. Zwart (2022). The Fundamentals of Heavy Tails: Properties, Emergence, and Estimation. Cambridge.

[44] Nemirovsky, A. S. and D. B. Yudin (1983). Problem Complexity and Method Efficiency in Optimization. Wiley.

[45] Oksendal, B. (2013). Stochastic Differential Equations: An Introduction with Applications. Springer.

[46] P´olya, G. (1949). Remarks on characteristic functions. In Proceedings of the First Berkeley Symposium on M h i l S i i d P b bili 115 123

[47] Resende, L. (2024). Robust high-dimensional Gaussian and bootstrap approximations for trimmed sample means. arXiv:2410.22085.

[48] Roy, A., K. Balasubramanian, and M. A. Erdogdu (2021). On empirical risk minimization with dependent and heavy-tailed data. Advances in Neural Information Processing Systems 34, 8913–8926.

[49] Ruppert, D. and D. S. Matteson (2011). Statistics and Data Analysis for Financial Engineering with R Examples. Springer.

[50] Singh, R. and S. Vijaykumar (2023). Kernel ridge regression inference. arXiv:2302.06578.

[51] Sun, Y., X. He, and J. Hu (2022). An omnibus test for detection of subgroup treatment effects via data partitioning. Annals of Applied Statistics 16(4), 2266–2278.

[52] Vidyamurthy, G. (2004). Pairs Trading: Quantitative Methods and Analysis. John Wiley & Sons.

[53] Yu, M. and X. Chen (2021). Finite sample change point inference and identification for high-dimensional mean vectors. Journal of the Royal Statistical Society Series B: Statistical Methodology 83(2), 247–270.

[54] Zhang, D. and W. B. Wu (2017). Gaussian approximation for high dimensional time series. The Annals of Statistics 45(5), 1895–1919.