Abstract

For measuring tail risk with scarce extreme events, extreme value anal

ysis is often invoked as the statistical tool to extrapolate to the tail of a distribution. The presence of large datasets benefits tail risk analysis by providing more

observations for conducting extreme value analysis. However, large datasets can

be stored distributedly preventing the possibility of directly analyzing them. In

this paper, we introduce a comprehensive set of tools for examining the asymptotic behavior of tail empirical and quantile processes in the setting where data

is distributed across multiple sources, for instance, when data are stored on multiple machines. Utilizing these tools, one can establish the oracle property for

most distributed estimators in extreme value statistics in a straightforward way.

We provide various examples to demonstrate the practicality and value of our

proposed toolkit.

Information

Preprint No.SS-2024-0222
Manuscript IDSS-2024-0222
Complete AuthorsLiujun Chen, Deyuan Li, Chen Zhou
Corresponding AuthorsChen Zhou
Emailszhou@ese.eur.nl

References

  1. Alves, M. I. F., M. I. Gomes, L. de Haan, and C. Neves (2007). A note on second order conditions in extreme value theory: linking general and heavy tail conditions. REVSTAT-Statistical Journal 5(3), 285–304.
  2. Chang, X., S.-B. Lin, and Y. Wang (2017). Divide and conquer local average regression. Electronic Journal of Statistics 11, 1326–1350.
  3. Chen, L., D. Li, and C. Zhou (2022). Distributed inference for the extreme value index. Biometrika 109(1), 257–264.
  4. Daouia, A., S. A. Padoan, and G. Stupfler (2024). Optimal weighted pooling for inference about the tail index and extreme quantiles. Bernoulli 30(2), 1287–1312.
  5. de Haan, L. and A. Ferreira (2006). Extreme Value Theory: An Introduction. Springer Science & Business Media.
  6. de Haan, L. and U. Stadtm¨uller (1996). Generalized regular variation of second order. Journal of the Australian Mathematical Society 61(3), 381–395.
  7. Drees, H. (1998). On smooth statistical tail functionals. Scandinavian Journal of Statistics 25(1), 187–210.
  8. Drees, H., L. de Haan, and D. Li (2006). Approximations to the tail empirical distribution function with application to testing extreme value conditions. Journal of Statistical Planning and Inference 136(10), 3498–3538.
  9. Drees, H., A. Ferreira, and L. de Haan (2004). On maximum likelihood estimation of the extreme value index. Annals of Applied Probability 14(3), 1179–1201.
  10. Embrechts, P., C. Kl¨uppelberg, and T. Mikosch (2013). Modelling Extremal Events: for Insurance and Finance. Springer Science & Business Media.
  11. Fan, J., D. Wang, K. Wang, and Z. Zhu (2019). Distributed estimation of principal eigenspaces. Annals of Statistics 47(6), 3009–3031.
  12. Gama, J., R. Sebastiao, and P. P. Rodrigues (2013). On evaluating stream learning algorithms. Machine Learning 90, 317–346.
  13. Gao, Y., W. Liu, H. Wang, X. Wang, Y. Yan, and R. Zhang (2022). A review of distributed statistical inference. Statistical Theory and Related Fields 6(2), 89–99.
  14. Hill, B. M. (1975). A simple general approach to inference about the tail of a distribution. Annals of Statistics 3(5), 1163–1174.
  15. Hosking, J. R. and J. R. Wallis (1987). Parameter and quantile estimation for the generalized pareto distribution. Technometrics 29(3), 339–349.
  16. Koml´os, J., P. Major, and G. Tusn´ady (1975). An approximation of partial sums of independent rv’-s, and the sample df. i. Zeitschrift f¨ur Wahrscheinlichkeitstheorie und verwandte Gebiete 32(1), 111–131.
  17. Lee, J. D., Q. Liu, Y. Sun, and J. E. Taylor (2017). Communication-efficient sparse regression. Journal of Machine Learning Research 18(5), 1–30.
  18. Li, R., D. K. Lin, and B. Li (2013). Statistical inference in massive data sets. Applied Stochastic Models in Business and Industry 29(5), 399–409.
  19. Lian, H. and Z. Fan (2018). Divide-and-conquer for debiased l1-norm support vector machine in ultra-high dimensions. Journal of Machine Learning Research 18(182), 1–26.
  20. Pickands III, J. (1975). Statistical inference using extreme order statistics. Annals of Statistics 3(1), 119–131.
  21. Resnick, S. I. (2007). Heavy-tail phenomena: probabilistic and statistical modeling. Springer Science & Business Media.
  22. Volgushev, S., S.-K. Chao, and G. Cheng (2019). Distributed inference for quantile regression processes. Annals of Statistics 47(3), 1634–1662.
  23. Zhu, X., F. Li, and H. Wang (2021). Least squares approximation for a distributed system. Journal of Computational and Graphical Statistics 30(4), 1004–1018. International Institute of Finance, School of Management, University of Science and Technology of China

Acknowledgments

The authors are grateful to the Editor, Associate Editor, and anonymous

referees, whose suggestions led great improvement of this work.

Liujun

Chen’s research was partially supported by the National Key R&D Program of China, No. 2024YFA1012200, and the National Natural Science

Foundation of China grants 12301387 and 12471279. Deyuan Li’s research

was partially supported by the National Natural Science Foundation of

China grants 11971115 and 12471279.

Supplementary Materials

The Supplementary Material contains all the technical proofs and simulation studies.


Supplementary materials are available for download.