Abstract
Quantile regression is a powerful tool for detecting exposure-outcome
associations given covariates across different parts of the outcome’s distribution,
but has two major limitations when the aim is to infer an exposure effect. Firstly,
the exposure coefficient estimator may not converge to a meaningful quantity
when the model is misspecified, and secondly, variable selection methods may
induce bias and excess uncertainty, rendering inferences biased and overly optimistic. In this paper, we address these issues via partially linear quantile regres-
sion models which parametrize the conditional association of interest, but do not
restrict the association with other covariates in the model. We propose consistent
estimators for the unknown model parameter by mapping it onto a nonparametric main effect estimand that captures the (conditional) association of interest
even when the quantile model is misspecified. This estimand is estimated using
the efficient influence function under the nonparametric model, allowing for the
incorporation of data-adaptive procedures such as variable selection and machine
learning. Our approach provides a flexible and reliable method for detecting associations, robust to model misspecification and excess uncertainty induced by
variable selection methods. The proposal is illustrated using simulation studies
and data on annual health care costs associated with excess body weight.
Information
| Preprint No. | SS-2025-0034 |
|---|---|
| Manuscript ID | SS-2025-0034 |
| Complete Authors | Georgi Baklicharov, Christophe Ley, Vanessa Gorasso, Brecht Devleesschauwer, Stijn Vansteelandt |
| Corresponding Authors | Georgi Baklicharov |
| Emails | georgi.baklicharov@ugent.be |
References
- Ai, C., O. Linton, and Z. Zhang (2022). Estimation and inference for the counterfactual distribution and quantile functions in continuous treatment models. Journal of Econometrics 228(1), 39–61. Annals Issue: In Honor of Ron Gallant.
- Alejo, J., A. F. Galvao, and G. Montes-Rojas (2018). Quantile continuous treatment effects. Econometrics and Statistics 8, 13–36.
- Athey, S., P. J. Bickel, A. Chen, G. W. Imbens, and M. Pollmann (2023, 07). Semi-parametric estimation of treatment effects in randomised experiments. Journal of the Royal Statistical
- Society: Series B (Statistical Methodology) 85(5), 1615–1638.
- Belloni, A., V. Chernozhukov, and C. Hansen (2013, 11). Inference on Treatment Effects after
- Selection among High-Dimensional Controls. The Review of Economic Studies 81(2), 608–650.
- Bickel, P. J., C. A. Klaassen, P. J. Bickel, Y. Ritov, J. Klaassen, J. A. Wellner, and Y. Ritov
- (1993). Efficient and adaptive estimation for semiparametric models, Volume 4. Baltimore: John Hopkins University Press.
- Breiman, L. (2001). Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statistical Science 16(3), 199–231.
- Caracciolo, F. and M. Furno (2017). Quantile treatment effect and double robust estimators: an appraisal on the italian labor market. Journal of Economic Studies 4(2), 585–604.
- Chernozhukov, V., D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins
- (2018, 01). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal 21(1), C1–C68.
- Chernozhukov, V., I. Fern´andez-Val, and B. Melly (2013). Inference on Counterfactual Distributions. Econometrica 81(6), 2205–2268.
- Chernozhukov, V., W. Newey, R. Singh, and V. Syrgkanis (2020). Adversarial estimation of Riesz representers. arXiv preprint arXiv:2101.00009.
- Chernozhukov, V., W. K. Newey, and R. Singh (2022). Automatic debiased machine learning of causal and structural effects. Econometrica 90(3), 967–1027.
- Das, K., M. Krzywinski, and N. Altman (2019). Quantile regression. Nature Methods 16(6), 451–452.
- Demarest, S., J. Van der Heyden, R. Charafeddine, S. Drieskens, L. Gisle, and J. Tafforeau
- (2013). Methodological basics and evolution of the Belgian health interview survey 1997– 2008. Archives of Public Health 71, 1–10.
- Doksum, K. (1974). Empirical Probability Plots and Statistical Inference for Nonlinear Models in the Two-Sample Case. The Annals of Statistics 2(2), 267–277.
- Donald, S. G. and Y.-C. Hsu (2014). Estimation and inference for distribution functions and quantile functions in treatment effect models. Journal of Econometrics 178, 383–397. Firpo,
- S. (2007). Efficient Semiparametric Estimation of Quantile Treatment Effects. Econometrica 75(1), 259–276.
- Gorasso, V., I. Moyersoen, J. Van der Heyden, K. De Ridder, S. Vandevijvere, S. Vansteelandt,
- D. De Smedt, and B. Devleesschauwer (2022). Health care costs and lost productivity costs related to excess weight in Belgium. BMC Public Health 22(1), 1–11.
- Guo, F. R. and R. D. Shah (2023). Rank-transformed subsampling: inference for multiple data splitting and exchangeable p-values. arXiv preprint arXiv:2301.02739.
- He, X. and P. Shi (1996). Bivariate tensor-product b-splines in a partly linear model. Journal of Multivariate Analysis 58(2), 162–181.
- Hern´an, M. A. and J. M. Robins (2020). Causal inference: What If. Boca Raton: Chapman & Hall/CRC.
- Huling, J. D., N. Greifer, and G. Chen (2023). Independence weights for causal inference with continuous treatments. Journal of the American Statistical Association 119(546), 1657– 1670.
- Koenker, R. (2005). Quantile Regression, Volume 38. Cambridge University Press.
- Koenker, R. and G. Bassett (1978). Regression Quantiles. Econometrica 46(1), 33–50.
- Koenker, R., V. Chernozhukov, X. He, and L. Peng (2017). Handbook of quantile regression (1st ed.). New York: Chapman & Hall/CRC. Koenker,
- R. and K. F. Hallock (2001). Quantile Regression. Journal of Economic Perspectives 15(4), 143–156.
- Lee, S. (2003). Efficient semiparametric estimation of a partially linear quantile regression model. Econometric Theory 19(1), 1–31.
- Leeb, H. and B. M. P¨otscher (2005). Model selection and inference: Facts and fiction. Econometric Theory 21(1), 21–59.
- Lehmann, E. L. and H. J. D’Abrera (1975). Nonparametrics: Statistical methods based on ranks. Holden-Day.
- Lv, Y., R. Zhang, W. Zhao, and J. Liu (2015). Quantile regression and variable selection of partial linear single-index model. Annals of the Institute of Statistical Mathematics 67(2), 375–409.
- Melly, B. (2006). Estimation of counterfactual distributions using quantile regression. Review of Labor Economics 68(4), 543–572.
- Pfanzagl, J. (1990). Estimation in semiparametric models, pp. 17–22. New York, NY: Springer US.
- Robins, J. (1986). A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical Modelling 7(9-12), 1393–1512.
- Sherwood, B. and L. Wang (2016). Partially linear additive quantile regression in ultra-high dimension. The Annals of Statistics 44(1), 288–317.
- Sun, Y. (2005). Semiparametric efficient estimation of partially linear quantile regression models. Annals of Economics and Finance 6(1), 105–127.
- van der Laan, M. J. and S. Rose (2011). Targeted learning: causal inference for observational and experimental data, Volume 4. New York: Springer Series in Statistics.
- van der Laan, M. J. and D. Rubin (2006). Targeted Maximum Likelihood Learning. U.C. Berkeley Division of Biostatistics Working Paper Series, Working Paper 213. Available at: https://biostats.bepress.com/ucbbiostat/paper213.
- van der Vaart, A. W. (1998). Functional Delta Method, pp. 291–303. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press. Vansteelandt,
- S. (2025). Towards efficient and interpretable assumption-lean generalized linear modeling of continuous exposure effects. Biometrics 81(2), ujaf071. 10.1093/biomtc/ujaf071.
- Vansteelandt, S. and O. Dukes (2022, 07). Assumption-lean Inference for Generalised Linear
- Model Parameters (with discussion). Journal of the Royal Statistical Society Series B: Statistical Methodology 84(3), 657–685.
- Vansteelandt, S., O. Dukes, K. V. Lancker, and T. Martinussen (2022). Assumption-Lean Cox Regression. Journal of the American Statistical Association 119(545), 475–484. 10.1080/01621459.2022.2126362.
- Wu, C. and Y. Yu (2014). Partially linear modeling of conditional quantiles using penalized splines. Computational Statistics & Data Analysis 77, 170–187.
- Wu, T. Z., K. Yu, and Y. Yu (2010). Single-index quantile regression. Journal of Multivariate Analysis 101(7), 1607–1621.
- Yadlowsky, S. (2022). Explaining practical differences between treatment effect estimators with high dimensional asymptotics. arXiv preprint arXiv:2203.12538.
- Yu, K., Z. Lu, and J. Stander (2003). Quantile regression: applications and current research areas. Journal of the Royal Statistical Society: Series D (The Statistician) 52(3), 331–350.
- Zheng, W. and M. J. van der Laan (2011). Cross-Validated Targeted Minimum-Loss-Based Estimation, pp. 459–474. New York, NY: Springer New York.
- Zhong, Q. and J.-L. Wang (2023). Neural networks for partially linear quantile regression. Journal of Business & Economic Statistics 42(2), 603–614. 10.1080/07350015.2023.2208183. Ghent University
Acknowledgments
We thank Statbel for BHIS sample selection and fieldwork coordination,
and the InterMutualistic Agency (IMA) for supporting data linkage. We
are also grateful to all BHIS participants. BHIS is financed by the Federal
and Inter-Federated Belgian Public Health authorities. The linkage between
BHIS data and the Belgian Compulsory Health Insurance data is financed
by the National Institute for Health and Disability Insurance. This work
is supported by Advanced ERC grant ACME (101141305) and the Fonds
Professor Frans Wuytack (Ghent University).
Supplementary Materials
available online includes details on the construction
of adjusted estimators based on parametric quantile regression with variable
selection, proofs, additional simulation results, a sensitivity analysis for the
data analysis, and a generalization of the estimand with link function.