Abstract
For measuring tail risk with scarce extreme events, extreme value anal
ysis is often invoked as the statistical tool to extrapolate to the tail of a distribution. The presence of large datasets benefits tail risk analysis by providing more
observations for conducting extreme value analysis. However, large datasets can
be stored distributedly preventing the possibility of directly analyzing them. In
this paper, we introduce a comprehensive set of tools for examining the asymptotic behavior of tail empirical and quantile processes in the setting where data
is distributed across multiple sources, for instance, when data are stored on multiple machines. Utilizing these tools, one can establish the oracle property for
most distributed estimators in extreme value statistics in a straightforward way.
We provide various examples to demonstrate the practicality and value of our
proposed toolkit.
Information
| Preprint No. | SS-2024-0222 |
|---|---|
| Manuscript ID | SS-2024-0222 |
| Complete Authors | Liujun Chen, Deyuan Li, Chen Zhou |
| Corresponding Authors | Chen Zhou |
| Emails | zhou@ese.eur.nl |
References
- Alves, M. I. F., M. I. Gomes, L. de Haan, and C. Neves (2007). A note on second order conditions in extreme value theory: linking general and heavy tail conditions. REVSTAT-Statistical Journal 5(3), 285–304.
- Chang, X., S.-B. Lin, and Y. Wang (2017). Divide and conquer local average regression. Electronic Journal of Statistics 11, 1326–1350.
- Chen, L., D. Li, and C. Zhou (2022). Distributed inference for the extreme value index. Biometrika 109(1), 257–264.
- Daouia, A., S. A. Padoan, and G. Stupfler (2024). Optimal weighted pooling for inference about the tail index and extreme quantiles. Bernoulli 30(2), 1287–1312.
- de Haan, L. and A. Ferreira (2006). Extreme Value Theory: An Introduction. Springer Science & Business Media.
- de Haan, L. and U. Stadtm¨uller (1996). Generalized regular variation of second order. Journal of the Australian Mathematical Society 61(3), 381–395.
- Drees, H. (1998). On smooth statistical tail functionals. Scandinavian Journal of Statistics 25(1), 187–210.
- Drees, H., L. de Haan, and D. Li (2006). Approximations to the tail empirical distribution function with application to testing extreme value conditions. Journal of Statistical Planning and Inference 136(10), 3498–3538.
- Drees, H., A. Ferreira, and L. de Haan (2004). On maximum likelihood estimation of the extreme value index. Annals of Applied Probability 14(3), 1179–1201.
- Embrechts, P., C. Kl¨uppelberg, and T. Mikosch (2013). Modelling Extremal Events: for Insurance and Finance. Springer Science & Business Media.
- Fan, J., D. Wang, K. Wang, and Z. Zhu (2019). Distributed estimation of principal eigenspaces. Annals of Statistics 47(6), 3009–3031.
- Gama, J., R. Sebastiao, and P. P. Rodrigues (2013). On evaluating stream learning algorithms. Machine Learning 90, 317–346.
- Gao, Y., W. Liu, H. Wang, X. Wang, Y. Yan, and R. Zhang (2022). A review of distributed statistical inference. Statistical Theory and Related Fields 6(2), 89–99.
- Hill, B. M. (1975). A simple general approach to inference about the tail of a distribution. Annals of Statistics 3(5), 1163–1174.
- Hosking, J. R. and J. R. Wallis (1987). Parameter and quantile estimation for the generalized pareto distribution. Technometrics 29(3), 339–349.
- Koml´os, J., P. Major, and G. Tusn´ady (1975). An approximation of partial sums of independent rv’-s, and the sample df. i. Zeitschrift f¨ur Wahrscheinlichkeitstheorie und verwandte Gebiete 32(1), 111–131.
- Lee, J. D., Q. Liu, Y. Sun, and J. E. Taylor (2017). Communication-efficient sparse regression. Journal of Machine Learning Research 18(5), 1–30.
- Li, R., D. K. Lin, and B. Li (2013). Statistical inference in massive data sets. Applied Stochastic Models in Business and Industry 29(5), 399–409.
- Lian, H. and Z. Fan (2018). Divide-and-conquer for debiased l1-norm support vector machine in ultra-high dimensions. Journal of Machine Learning Research 18(182), 1–26.
- Pickands III, J. (1975). Statistical inference using extreme order statistics. Annals of Statistics 3(1), 119–131.
- Resnick, S. I. (2007). Heavy-tail phenomena: probabilistic and statistical modeling. Springer Science & Business Media.
- Volgushev, S., S.-K. Chao, and G. Cheng (2019). Distributed inference for quantile regression processes. Annals of Statistics 47(3), 1634–1662.
- Zhu, X., F. Li, and H. Wang (2021). Least squares approximation for a distributed system. Journal of Computational and Graphical Statistics 30(4), 1004–1018. International Institute of Finance, School of Management, University of Science and Technology of China
Acknowledgments
The authors are grateful to the Editor, Associate Editor, and anonymous
referees, whose suggestions led great improvement of this work.
Liujun
Chen’s research was partially supported by the National Key R&D Program of China, No. 2024YFA1012200, and the National Natural Science
Foundation of China grants 12301387 and 12471279. Deyuan Li’s research
was partially supported by the National Natural Science Foundation of
China grants 11971115 and 12471279.
Supplementary Materials
The Supplementary Material contains all the technical proofs and simulation studies.