A Conditionally Studentized Test for High-dimensional Parametric Regression via Sample Splitting

Feng Liang, Chuhan Wang, Jiaqi Huang and Lixing Zhu

doi:10.5705/ss.202025.0183

Abstract

This paper introduces a Conditionally Studentized Test (COST) for model

checking in general parametric regression models, addressing this challenge without relying on dimension reduction or sparsity assumptions. COST is constructed

from two disjoint sample partitions linked by a weight matrix and incorporates a

conditional studentization with respect to one of the subsamples. It can achieve

asymptotic normality under the null hypothesis, regardless of the form of the

initial test statistic (global or local smoothing-based) and irrespective of the relationship between predictor dimension, sample size, and number of parameters

(fixed or diverging under certain rate constraints).

Under certain conditions

on the regression functions, asymptotic normality can even hold when the predictor dimension exceeds the sample size, potentially enabling the analysis of

high-dimensional problems.

Furthermore, COST demonstrates a fast rate of

detection for local alternatives. The paper explores sample partitioning and provides numerical studies showcasing COST’s finite-sample performance, including

scenarios where the predictor dimension equals the sample size.

Key words and phrases: Asymptotic model-free test, conditional studentization, high dimensions, model checking, sample-splitting 1

Information

Preprint No.	SS-2025-0183
Manuscript ID	SS-2025-0183
Complete Authors	Feng Liang, Chuhan Wang, Jiaqi Huang, Lixing Zhu
Corresponding Authors	Jiaqi Huang
Emails	jhuang@mail.bnu.edu.cn

References

Ahmed, M., M. Jahangir, H. Afzal, A. Majeed, and I. Siddiqi (2015). Using crowd-source based features from social media and conventional features to predict the movies popularity. In 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), pp. 273–278. IEEE.
Bierens, H. J. (1982). Consistent model specification tests. Journal of Econometrics 20, 105– 134.
Escanciano, J. C. (2006). A consistent diagnostic test for regression models using projections. Econometric Theory 22, 1030–1051.
Escanciano, J. C. (2009). On the lack of power of omnibus specification tests. Econometric Theory 25, 162–194.
Gao, H., R. Wang, and X. Shao (2023). Dimension-agnostic change point detection. arXiv preprint arXiv:2303.10808.
Guo, X., T. Wang, and L. Zhu (2016). Model checking for parametric single-index models: a dimension reduction model-adaptive approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78, 1013–1035.
H¨ardle, W. and E. Mammen (1993). Comparing nonparametric versus parametric regression fits. The Annals of Statistics 21, 1926–1947.
Jankov´a, J., R. D. Shah, P. B¨uhlmann, and R. J. Samworth (2020). Goodness-of-fit testing in high dimensional generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82, 773–795.
Kim, I. and A. Ramdas (2024). Dimension-agnostic inference using cross u-statistics. Bernoulli 30(1), 683–711.
Li, L., S. N. Chiu, and L. Zhu (2019). Model checking for regressions: An approach bridging between local smoothing and global smoothing methods. Computational Statistics & Data Analysis 138, 64–82.
Shah, R. D. and P. B¨uhlmann (2018). Goodness-of-fit tests for high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 80, 113–135.
Shekhar, S., I. Kim, and A. Ramdas (2022). A permutation-free kernel two-sample test. Advances in Neural Information Processing Systems 35, 18168–18180.
Shekhar, S., I. Kim, and A. Ramdas (2023). A permutation-free kernel independence test. Journal of Machine Learning Research 24(369), 1–68.
Stute, W. and L. Zhu (2002). Model checks for generalized linear models. Scandinavian Journal of Statistics 29, 535–545.
Tan, F. and L. Zhu (2019). Adaptive-to-model checking for regressions with diverging number of predictors. The Annals of Statistics 47, 1960–1994.
Tan, F. and L. Zhu (2022). Integrated conditional moment test and beyond: when the number of covariates is divergent. Biometrika 109, 103–122.
Zheng, J. X. (1996). A consistent test of functional form via nonparametric estimation techniques. Journal of Econometrics 75, 263–289.
Zhu, L. (2003). Model checking of dimension-reduction type for regression. Statistica Sinica 13, 283–296.
Zhu, L. and R. Li (1998). Dimension-reduction type test for linearity of a stochastic regression model. Acta Mathematicae Applicatae Sinica 14, 165–175.

Acknowledgments

Equal contributions were made by all authors to this research. The authors

The research was supported by the grants (NSFC12131006, NSFC12471276)

from the National Natural Scientific Foundation of China and the grant

(CI2023C063YLL) from the Scientific and Technological Innovation Project

of China Academy of Chinese Medical Science.

Supplementary Materials

The technical proofs are provided in the Supplementary Materials.

Supplementary materials are available for download.

[1] Ahmed, M., M. Jahangir, H. Afzal, A. Majeed, and I. Siddiqi (2015). Using crowd-source based features from social media and conventional features to predict the movies popularity. In 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), pp. 273–278. IEEE.

[2] Bierens, H. J. (1982). Consistent model specification tests. Journal of Econometrics 20, 105– 134.

[3] Escanciano, J. C. (2006). A consistent diagnostic test for regression models using projections. Econometric Theory 22, 1030–1051.

[4] Escanciano, J. C. (2009). On the lack of power of omnibus specification tests. Econometric Theory 25, 162–194.

[5] Gao, H., R. Wang, and X. Shao (2023). Dimension-agnostic change point detection. arXiv preprint arXiv:2303.10808.

[6] Guo, X., T. Wang, and L. Zhu (2016). Model checking for parametric single-index models: a dimension reduction model-adaptive approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78, 1013–1035.

[7] H¨ardle, W. and E. Mammen (1993). Comparing nonparametric versus parametric regression fits. The Annals of Statistics 21, 1926–1947.

[8] Jankov´a, J., R. D. Shah, P. B¨uhlmann, and R. J. Samworth (2020). Goodness-of-fit testing in high dimensional generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82, 773–795.

[9] Kim, I. and A. Ramdas (2024). Dimension-agnostic inference using cross u-statistics. Bernoulli 30(1), 683–711.

[10] Li, L., S. N. Chiu, and L. Zhu (2019). Model checking for regressions: An approach bridging between local smoothing and global smoothing methods. Computational Statistics & Data Analysis 138, 64–82.

[11] Shah, R. D. and P. B¨uhlmann (2018). Goodness-of-fit tests for high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 80, 113–135.

[12] Shekhar, S., I. Kim, and A. Ramdas (2022). A permutation-free kernel two-sample test. Advances in Neural Information Processing Systems 35, 18168–18180.

[13] Shekhar, S., I. Kim, and A. Ramdas (2023). A permutation-free kernel independence test. Journal of Machine Learning Research 24(369), 1–68.

[14] Stute, W. and L. Zhu (2002). Model checks for generalized linear models. Scandinavian Journal of Statistics 29, 535–545.

[15] Tan, F. and L. Zhu (2019). Adaptive-to-model checking for regressions with diverging number of predictors. The Annals of Statistics 47, 1960–1994.

[16] Tan, F. and L. Zhu (2022). Integrated conditional moment test and beyond: when the number of covariates is divergent. Biometrika 109, 103–122.

[17] Zheng, J. X. (1996). A consistent test of functional form via nonparametric estimation techniques. Journal of Econometrics 75, 263–289.

[18] Zhu, L. (2003). Model checking of dimension-reduction type for regression. Statistica Sinica 13, 283–296.

[19] Zhu, L. and R. Li (1998). Dimension-reduction type test for linearity of a stochastic regression model. Acta Mathematicae Applicatae Sinica 14, 165–175.