High-dimensional Extreme Quantile Regression

Yiwei Tang, Huixia Judy Wang and Deyuan Li

doi:10.5705/ss.202025.0044

Abstract

The estimation of conditional quantiles at extreme tails is of great inter

est in numerous applications. Various methods that integrate regression analysis

with an extrapolation strategy derived from extreme value theory have been proposed to estimate extreme conditional quantiles in scenarios with a fixed number

of covariates. However, these methods become less effective in high-dimensional

settings, where the number of covariates grows with the sample size. In this article, we develop new estimation methods tailored for extreme conditional quantiles

with high-dimensional covariates. We establish the asymptotic properties of the

proposed estimators and demonstrate their superior performance through simulation studies, particularly in scenarios of growing dimension and high dimension

where existing methods may fail. Furthermore, the analysis of auto insurance

data validates the efficacy of our methods in estimating extreme conditional

insurance claims and selecting important variables.

Key words and phrases: Extrapolation; Extreme value; High-dimensional data; Regression analysis

Information

Preprint No.	SS-2025-0044
Manuscript ID	SS-2025-0044
Complete Authors	Yiwei Tang, Huixia Judy Wang, Deyuan Li
Corresponding Authors	Deyuan Li
Emails	deyuanli@fudan.edu.cn

References

Belloni, A. and Chernozhukov, V. (2011), ‘ℓ1-penalized quantile regression in high-dimensional sparse models’, Annals of Statistics 39(1), 82–130.
Belloni, A., Chernozhukov, V., Chetverikov, D. and Fernández-Val, I. (2019), ‘Conditional quantile processes based on series or many regressors’, Journal of Econometrics 213(1), 4– 29.
Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009), ‘Simultaneous analysis of lasso and dantzig selector’, Annals of Statistics 37(4), 1705–1732.
Bradic, J. and Kolar, M. (2017), ‘Uniform inference for high-dimensional quantile regression: linear functionals and regression rank scores’, arXiv:1702.06209.
Chatterjee, A. and Lahiri, S. (2010), ‘Asymptotic properties of the residual bootstrap for lasso estimators’, Proceedings of the American Mathematical Society 138(12), 4497–4509.
Chatterjee, A. and Lahiri, S. N. (2011), ‘Bootstrapping lasso estimators’, Journal of the American Statistical Association 106(494), 608–625.
Chernozhukov, V. (2005), ‘Extramal quantile regression’, Annals of Statistics 33(2), 806–839.
Chetverikov, D., Liao, Z. and Chernozhukov, V. (2021), ‘On cross-validated lasso in high dimensions’, Annals of Statistics 49(3), 1300–1317.
Clemente, C., Guerreiro, G. R. and Bravo, J. M. (2023), ‘Modelling motor insurance claim frequency and severity using gradient boosting’, Risks 11(9), 1–20.
Daouia, A., Gardes, L. and Girard, S. (2013), ‘On kernel smoothing for extremal quantile regression’, Bernoulli 19(5B), 2557–2589.
Daouia, A., Stupfler, G. and Usseglio-Carleve, A. (2023), ‘Inference for extremal regression with dependent heavy-tailed data’, Annals of Statistics 51(5), 2040–2066.
de Haan, L. and Ferreira, A. (2006), Extreme Value Theory: An Introduction, Springer Science & Business Media.
de Wet, T., Goegebeur, Y., Guillou, A. and Osmann, M. (2016), ‘Kernel regression with Weibulltype tails’, Annals of the Institute of Statistical Mathematics 68, 1135–1162.
Drees, H. (1995), ‘Refined pickands estimators of the extreme value index’, Annals of Statistics 23(6), 2059–2080.
Fan, J., Fan, Y. and Barut, E. (2014), ‘Adaptive robust variable selection’, Annals of Statistics 42(1), 324–351.
Gardes, L. and Girard, S. (2016), ‘On the estimation of the functional weibull tail-coefficient’, Journal of Multivariate Analysis 146, 29–45.
Gardes, L. and Stupfler, G. (2019), ‘An integrated functional weissman estimator for conditional extreme quantiles’, REVSTAT-Statistical Journal 17(1), 109–144.
Gnecco, N., Terefe, E. M. and Engelke, S. (2024), ‘Extremal random forests’, Journal of the American Statistical Association 119(548), 3059–3072.
He, F., Wang, H. J. and Tong, T. (2020), ‘Extremal linear quantile regression with weibull-type tails’, Statistica Sinica 30(3), 1357–1377.
He, F., Wang, H. J. and Zhou, Y. (2022), ‘Extremal quantile autoregression for heavy-tailed time series’, Computational Statistics & Data Analysis 176, 107563.
He, X., Pan, X., Tan, K. M. and Zhou, W.-X. (2023), ‘Smoothed quantile regression with large-scale inference’, Journal of Econometrics 232(2), 367–388.
He, X. and Shao, Q.-M. (2000), ‘On parameters of increasing dimensions’, Journal of Multivariate Analysis 73(1), 120–135.
Hill, B. M. (1975), ‘A simple general approach to inference about the tail of a distribution’, Annals of Statistics 3(5), 1163–1174.
Homrighausen, D. and McDonald, D. J. (2017), ‘Risk consistency of cross-validation with lassotype procedures’, Statistica Sinica 49(3), 1017–1036.
Javanmard, A. and Montanari, A. (2014), ‘Confidence intervals and hypothesis testing for highdimensional regression’, The Journal of Machine Learning Research 15(1), 2869–2909.
Koenker, R. (2005), Quantile Regression, Cambridge University Press.
Koenker, R. and Bassett, J. G. (1978), ‘Regression quantiles’, Econometrica 46(1), 33–50.
Koenker, R., Chernozhukov, V., He, X. and Peng, L. (2017), Handbook of quantile regression, CRC press.
Li, D. and Wang, H. J. (2019), ‘Extreme quantile estimation for autoregressive models’, Journal of Business & Economic Statistics 37(4), 661–670.
Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012), ‘A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers’, Statistical Science 27(4), 538–557.
Neves, M. M., Gomes, M. I., Figueiredo, F. and Prata Gomes, D. (2015), ‘Modeling extreme events: sample fraction adaptive choice in parameter estimation’, Journal of Statistical Theory and Practice 9(1), 184–199.
Pan, X. and Zhou, W.-X. (2021), ‘Multiplier bootstrap for quantile regression: non-asymptotic theory under random design’, Information and Inference 10(3), 813–861.
Sasaki, Y., Tao, J. and Wang, Y. (2024), ‘High-dimensional tail index regression: with an application to text analyses of viral posts in social media’, arXiv:2403.01318.
Tan, K. M., Wang, L. and Zhou, W.-X. (2022), ‘High-dimensional quantile regression: Convolution smoothing and concave regularization’, Journal of the Royal Statistical Society Series B: Statistical Methodology 84(1), 205–233.
van de Geer, S., Bühlmann, P., Ritov, Y. and Dezeure, R. (2014), ‘On asymptotically optimal confidence regions and tests for high-dimensional models’, Annals of Statistics 42(3), 1166– 1202.
Velthoen, J., Dombry, C., Cai, J.-J. and Engelke, S. (2023), ‘Gradient boosting for extreme quantile regression’, Extremes 26(4), 639–667.
Wang, H. J. and Li, D. (2013), ‘Estimation of extreme conditional quantiles through power transformation’, Journal of the American Statistical Association 108(503), 1062–1074.
Wang, H. J., Li, D. and He, X. (2012), ‘Estimation of high conditional quantiles for heavy-tailed distributions’, Journal of the American Statistical Association 107(500), 1453–1464.
Wang, H. and Tsai, C.-L. (2009), ‘Tail index regression’, Journal of the American Statistical Association 104(487), 1233–1240.
Wang, L., Wu, Y. and Li, R. (2012), ‘Quantile regression for analyzing heterogeneity in ultrahigh dimension’, Journal of the American Statistical Association 107(497), 214–222.
Welsh, A. (1989), ‘On m-processes and m-estimation’, Annals of Statistics 17(1), 337–361.
Wu, Y. and Wang, L. (2020), ‘A survey of tuning parameter selection for high-dimensional regression’, Annual review of statistics and its application 7(1), 209–226.
Xu, W., Hou, Y. and Li, D. (2022), ‘Prediction of extremal expectile based on regression models with heteroscedastic extremes’, Journal of Business & Economic Statistics 40(2), 522–536.
Xu, W., Wang, H. J. and Li, D. (2022), ‘Extreme quantile estimation based on the tail singleindex model’, Statistica Sinica 32(2), 893–914.
Yan, Y., Wang, X. and Zhang, R. (2023), ‘Confidence intervals and hypothesis testing for high-dimensional quantile regression: Convolution smoothing and debiasing’, Journal of Machine Learning Research 24(245), 1–49.
Youngman, B. D. (2019), ‘Generalized additive models for exceedances of high thresholds with an application to return level estimation for us wind gusts’, Journal of the American Statistical Association 114(528), 1865–1879.
Zhang, C.-H. and Zhang, S. S. (2014), ‘Confidence intervals for low dimensional parameters in high dimensional linear models’, Journal of the Royal Statistical Society Series B: Statistical Methodology 76(1), 217–242.
Zhao, T., Kolar, M. and Liu, H. (2014), ‘A general framework for robust testing and confidence regions in high-dimensional quantile regression’, arXiv:1412.8724.
Zheng, Q., Gallagher, C. and Kulasekera, K. (2013), ‘Adaptive penalized quantile regression for high dimensional data’, Journal of Statistical Planning and Inference 143(6), 1029–1038.

Acknowledgments

Deyuan Li’s research was partially supported by the National Natural Science Foundation of China grants 11971115 and 12471279.

Supplementary Materials

The supplementary materials comprise a PDF titled Supplementary Material for High-dimensional Extreme Quantile Regression—containing techni-

cal conditions, proofs, simulation results, and extra details on the auto insurance claims data—and a CSV file with the auto insurance claims dataset

used in Section 4.

Supplementary materials are available for download.

[1] Belloni, A. and Chernozhukov, V. (2011), ‘ℓ1-penalized quantile regression in high-dimensional sparse models’, Annals of Statistics 39(1), 82–130.

[2] Belloni, A., Chernozhukov, V., Chetverikov, D. and Fernández-Val, I. (2019), ‘Conditional quantile processes based on series or many regressors’, Journal of Econometrics 213(1), 4– 29.

[3] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009), ‘Simultaneous analysis of lasso and dantzig selector’, Annals of Statistics 37(4), 1705–1732.

[4] Bradic, J. and Kolar, M. (2017), ‘Uniform inference for high-dimensional quantile regression: linear functionals and regression rank scores’, arXiv:1702.06209.

[5] Chatterjee, A. and Lahiri, S. (2010), ‘Asymptotic properties of the residual bootstrap for lasso estimators’, Proceedings of the American Mathematical Society 138(12), 4497–4509.

[6] Chatterjee, A. and Lahiri, S. N. (2011), ‘Bootstrapping lasso estimators’, Journal of the American Statistical Association 106(494), 608–625.

[7] Chernozhukov, V. (2005), ‘Extramal quantile regression’, Annals of Statistics 33(2), 806–839.

[8] Chetverikov, D., Liao, Z. and Chernozhukov, V. (2021), ‘On cross-validated lasso in high dimensions’, Annals of Statistics 49(3), 1300–1317.

[9] Clemente, C., Guerreiro, G. R. and Bravo, J. M. (2023), ‘Modelling motor insurance claim frequency and severity using gradient boosting’, Risks 11(9), 1–20.

[10] Daouia, A., Gardes, L. and Girard, S. (2013), ‘On kernel smoothing for extremal quantile regression’, Bernoulli 19(5B), 2557–2589.

[11] Daouia, A., Stupfler, G. and Usseglio-Carleve, A. (2023), ‘Inference for extremal regression with dependent heavy-tailed data’, Annals of Statistics 51(5), 2040–2066.

[12] de Haan, L. and Ferreira, A. (2006), Extreme Value Theory: An Introduction, Springer Science & Business Media.

[13] de Wet, T., Goegebeur, Y., Guillou, A. and Osmann, M. (2016), ‘Kernel regression with Weibulltype tails’, Annals of the Institute of Statistical Mathematics 68, 1135–1162.

[14] Drees, H. (1995), ‘Refined pickands estimators of the extreme value index’, Annals of Statistics 23(6), 2059–2080.

[15] Fan, J., Fan, Y. and Barut, E. (2014), ‘Adaptive robust variable selection’, Annals of Statistics 42(1), 324–351.

[16] Gardes, L. and Girard, S. (2016), ‘On the estimation of the functional weibull tail-coefficient’, Journal of Multivariate Analysis 146, 29–45.

[17] Gardes, L. and Stupfler, G. (2019), ‘An integrated functional weissman estimator for conditional extreme quantiles’, REVSTAT-Statistical Journal 17(1), 109–144.

[18] Gnecco, N., Terefe, E. M. and Engelke, S. (2024), ‘Extremal random forests’, Journal of the American Statistical Association 119(548), 3059–3072.

[19] He, F., Wang, H. J. and Tong, T. (2020), ‘Extremal linear quantile regression with weibull-type tails’, Statistica Sinica 30(3), 1357–1377.

[20] He, F., Wang, H. J. and Zhou, Y. (2022), ‘Extremal quantile autoregression for heavy-tailed time series’, Computational Statistics & Data Analysis 176, 107563.

[21] He, X., Pan, X., Tan, K. M. and Zhou, W.-X. (2023), ‘Smoothed quantile regression with large-scale inference’, Journal of Econometrics 232(2), 367–388.

[22] He, X. and Shao, Q.-M. (2000), ‘On parameters of increasing dimensions’, Journal of Multivariate Analysis 73(1), 120–135.

[23] Hill, B. M. (1975), ‘A simple general approach to inference about the tail of a distribution’, Annals of Statistics 3(5), 1163–1174.

[24] Homrighausen, D. and McDonald, D. J. (2017), ‘Risk consistency of cross-validation with lassotype procedures’, Statistica Sinica 49(3), 1017–1036.

[25] Javanmard, A. and Montanari, A. (2014), ‘Confidence intervals and hypothesis testing for highdimensional regression’, The Journal of Machine Learning Research 15(1), 2869–2909.

[26] Koenker, R. (2005), Quantile Regression, Cambridge University Press.

[27] Koenker, R. and Bassett, J. G. (1978), ‘Regression quantiles’, Econometrica 46(1), 33–50.

[28] Koenker, R., Chernozhukov, V., He, X. and Peng, L. (2017), Handbook of quantile regression, CRC press.

[29] Li, D. and Wang, H. J. (2019), ‘Extreme quantile estimation for autoregressive models’, Journal of Business & Economic Statistics 37(4), 661–670.

[30] Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012), ‘A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers’, Statistical Science 27(4), 538–557.

[31] Neves, M. M., Gomes, M. I., Figueiredo, F. and Prata Gomes, D. (2015), ‘Modeling extreme events: sample fraction adaptive choice in parameter estimation’, Journal of Statistical Theory and Practice 9(1), 184–199.

[32] Pan, X. and Zhou, W.-X. (2021), ‘Multiplier bootstrap for quantile regression: non-asymptotic theory under random design’, Information and Inference 10(3), 813–861.

[33] Sasaki, Y., Tao, J. and Wang, Y. (2024), ‘High-dimensional tail index regression: with an application to text analyses of viral posts in social media’, arXiv:2403.01318.

[34] Tan, K. M., Wang, L. and Zhou, W.-X. (2022), ‘High-dimensional quantile regression: Convolution smoothing and concave regularization’, Journal of the Royal Statistical Society Series B: Statistical Methodology 84(1), 205–233.

[35] van de Geer, S., Bühlmann, P., Ritov, Y. and Dezeure, R. (2014), ‘On asymptotically optimal confidence regions and tests for high-dimensional models’, Annals of Statistics 42(3), 1166– 1202.

[36] Velthoen, J., Dombry, C., Cai, J.-J. and Engelke, S. (2023), ‘Gradient boosting for extreme quantile regression’, Extremes 26(4), 639–667.

[37] Wang, H. J. and Li, D. (2013), ‘Estimation of extreme conditional quantiles through power transformation’, Journal of the American Statistical Association 108(503), 1062–1074.

[38] Wang, H. J., Li, D. and He, X. (2012), ‘Estimation of high conditional quantiles for heavy-tailed distributions’, Journal of the American Statistical Association 107(500), 1453–1464.

[39] Wang, H. and Tsai, C.-L. (2009), ‘Tail index regression’, Journal of the American Statistical Association 104(487), 1233–1240.

[40] Wang, L., Wu, Y. and Li, R. (2012), ‘Quantile regression for analyzing heterogeneity in ultrahigh dimension’, Journal of the American Statistical Association 107(497), 214–222.

[41] Welsh, A. (1989), ‘On m-processes and m-estimation’, Annals of Statistics 17(1), 337–361.

[42] Wu, Y. and Wang, L. (2020), ‘A survey of tuning parameter selection for high-dimensional regression’, Annual review of statistics and its application 7(1), 209–226.

[43] Xu, W., Hou, Y. and Li, D. (2022), ‘Prediction of extremal expectile based on regression models with heteroscedastic extremes’, Journal of Business & Economic Statistics 40(2), 522–536.

[44] Xu, W., Wang, H. J. and Li, D. (2022), ‘Extreme quantile estimation based on the tail singleindex model’, Statistica Sinica 32(2), 893–914.

[45] Yan, Y., Wang, X. and Zhang, R. (2023), ‘Confidence intervals and hypothesis testing for high-dimensional quantile regression: Convolution smoothing and debiasing’, Journal of Machine Learning Research 24(245), 1–49.

[46] Youngman, B. D. (2019), ‘Generalized additive models for exceedances of high thresholds with an application to return level estimation for us wind gusts’, Journal of the American Statistical Association 114(528), 1865–1879.

[47] Zhang, C.-H. and Zhang, S. S. (2014), ‘Confidence intervals for low dimensional parameters in high dimensional linear models’, Journal of the Royal Statistical Society Series B: Statistical Methodology 76(1), 217–242.

[48] Zhao, T., Kolar, M. and Liu, H. (2014), ‘A general framework for robust testing and confidence regions in high-dimensional quantile regression’, arXiv:1412.8724.

[49] Zheng, Q., Gallagher, C. and Kulasekera, K. (2013), ‘Adaptive penalized quantile regression for high dimensional data’, Journal of Statistical Planning and Inference 143(6), 1029–1038.