Abstract

Motivated by the widely used geometric median-of-means estimator in machine learning,

this paper studies statistical inference for ultrahigh dimensionality location parameter based on the

sample spatial median under a general multivariate model, including simultaneous confidence intervals

construction, global tests, and multiple testing with false discovery rate control. To achieve these goals,

we derive a novel Bahadur representation of the sample spatial median with a maximum-norm bound

on the remainder term, and establish Gaussian approximation for the sample spatial median over the

class of hyperrectangles. In addition, a multiplier bootstrap algorithm is proposed to approximate the

distribution of the sample spatial median. The approximations are valid when the dimension diverges

at an exponentially rate of the sample size, which facilitates the application of the spatial median in

the ultrahigh dimensional region. The proposed approaches are further illustrated by simulations and

analysis of a genomic dataset from a microarray study.

Information

Preprint No.SS-2023-0242
Manuscript IDSS-2023-0242
Complete AuthorsGuanghui Cheng, Liuhua Peng, Changliang Zou
Corresponding AuthorsChangliang Zou
Emailsnk.chlzou@gmail.com

References

  1. Belloni, A., V. Chernozhukov, D. Chetverikov, C. Hansen, and K. Kato (2018). Highdimensional econometrics and generalized gmm. arXiv, 1806.01888.
  2. Bickel, P. J. and E. Levina (2008). Covariance regularization by thresholding. Ann. Statist. 36, 2577–2604.
  3. Brown, B. (1983). Statistical uses of the spatial median. J. R. Statist. Soc. B 45, 25–30.
  4. Cardot, H., P. C´enac, and P.-A. Zitt (2013). Efficient and fast estimation of the geometric median in hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli 19, 18–43.
  5. Chen, S. X. and Y. Qin (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist. 38, 808–835.
  6. Chen, X. (2018). Gaussian and bootstrap approximations for high-dimensional U-statistics and their applications. Ann. Statist. 46, 642–678.
  7. Cheng, G., B. Liu, L. Peng, B. Zhang, and S. Zheng (2019). Testing the equality of two high-dimensional spatial sign covariance matrices. Scand. J. Statist. 46, 257–271.
  8. Chernozhukov, V., D. Chetverikov, and K. Kato (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Statist. 41, 2786–2819.
  9. Chernozhukov, V., D. Chetverikov, and K. Kato (2017). Central limit theorems and bootstrap in high dimensions. The Annals of Probability 45, 2309–2352.
  10. Chernozhukov, V., D. Chetverikov, and K. Kato (2022). Improved central limit theorem and bootstrap approximation in high dimensions. Ann. Statist. 50, 2562–2586. limit theorem and bootstrap approximations in high dimensions. The Annals of Applied Probability 33, 2374–2425.
  11. Fan, J. and J. Lv (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Statist. Soc. B 70, 849–911.
  12. Fang, K. W., S. Kotz, and K. W. Ng (1990). Symmetric multivariate and related distributions. Boca Raton, FL: CRC Press.
  13. Haldane, J. B. S. (1948). Note on the median of a multivariate distribution. Biometrika 35, 414–417.
  14. Hsu, D. and S. Sabato (2016). Loss minimization and parameter estimation with heavy tails. J. Mach. Learn. Res. 17, 1–40.
  15. Imaizumi, M. and T. Otsu (2021). On gaussian approximation for m-estimator. arXiv, 2012.15678v2.
  16. Li, W. and Y. Xu (2022). Asymptotic properties of high-dimensional spatial median in elliptical distributions with application. Journal of Multivariate Analysis 190, 104975.
  17. Liu, W. and Q.-M. Shao (2014). Phase transition and regularized bootstrap in large scale t-tests with false discovery rate control. Ann. Statist. 42, 2003–2025.
  18. Magyar, A. and D. E. Tyler (2011). The asymptotic efficiency of the spatial median for elliptically symmetric distributions. Sankhya B 73, 165–192.
  19. McNeil, A. J., R. Frey, and P. Embrechts (2005). Quantitative Risk Management: Concepts,
  20. Milasevic, P. and G. R. Ducharme (1987). Uniqueness of the spatial median. Ann. Statist. 15, 1332–1333.
  21. Minsker, S. (2015). Geometric median and robust estimation in banach spaces. Bernoulli 21, 2308–2335.
  22. Oja, H. (2010). Multivariate nonparametric methods with R: An approach based on spatial signs and ranks. Lecture Notes in Statistics, Springer, New York.
  23. Prasad, A., A. S. Suggala, S. Balakrishnan, and P. Ravikumar (2020). Robust estimation via robust gradient estimation. J. R. Statist. Soc. B 82, 601–627.
  24. Purdom, E. and S. P. Holmes (2005). Error distribution for gene expression data. Statistical Applications in Genetics and Molecular Biology 4, 1–35.
  25. van der Vaart, A. W. and J. A. Wellner (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer.
  26. Vardi, Y. and C. H. Zhang (2000). The multivariate l 1-median and associated data depth. Proceedings of the National Academy of Sciences 97, 1423–1426.
  27. Wang, L., B. Peng, and R. Li (2015). A high-dimensional nonparametric multivariate test for mean vector. J. Am. Statist. Assoc. 110, 1658–1669.
  28. Weber, A. (1929). Uber Den Standort der Industrien (Alfred Weber?s Theory of the Location of Industries). Chicago, IL: Univ. Chicago Press.
  29. Wu, X., J. Wang, X. Cui, L. Maianu, B. Rhees, J. Rosinski, W. V. So, S. M. Willi, M. V.
  30. effect of insulin on expression of genes and biochemical pathways in human skeletal muscle. Endocrine 31, 5–17.
  31. Yao, J., S. Zheng, and Z. Bai (2015). Sample covariance matrices and high-dimensional data analysis. Cambridge University Press, Cambridge.
  32. Zou, C., L. Peng, L. Feng, and Z. Wang (2014). Multivariate sign-based high-dimensional tests for sphericity. Biometrika 101, 229–236. Guangzhou Institute of International Finance, Guangzhou University, Guangzhou 510006, China

Acknowledgments

Cheng was supported by the Tertiary Education Scientific research project of Guangzhou

Municipal Education Bureau 2024312244. Peng was supported by the Australian Research

Council (ARC) with grant number LP240100101. Zou was supported by the National Key

R&D Program of China (Grant Nos. 2022YFA1003800, 2022YFA1003703) and the National

Natural Science Foundation of China (Grant Nos. 12231011).

Supplementary Materials

The supplementary materials consist of the proofs of main results in the paper, preliminary

lemmas, and additional simulation results.


Supplementary materials are available for download.