Nonparametric Inference on Treatment-biomarker Interaction Based on Probability Index

Zehui Wang, Yanglei Song, Wenyu Jiang and Dongsheng Tu

doi:10.5705/ss.202025.0076

Abstract

In precision medicine, an important task is identifying subgroups of

patients who may benefit m ore f rom a t reatment b ased o n a c linical variable

or biomarker. In this paper, we propose a non-parametric treatment-biomarker

interaction measure using a probabilistic index to assess differences in treatment

effects b etween s ubgroups f ormed b y d ichotomizing a b iomarker a t a cutpoint.

When the cutpoint is prespecified, null hypothesis of no interaction is tested using

Wilcoxon-type statistics. When the cutpoint is not prespecified, t he s ame null

hypothesis is tested across a range of cutpoint values by taking the supremum

of Wilcoxon-type statistics with p-value calculated by a bootstrap procedure. It

is shown that the sizes of the proposed tests converge to the nominal level in

both cases. If the null hypothesis is rejected in the case of unspecified cutpoint,

a profile e stimator f or t he c utpoint t hat m aximizes t he d ifference in treatment

effects is p roposed. We show that the estimator has cubic-rate convergence and

asymptotically follows a scaled Chernoff’s d istribution. Furthermore, we introduce an m-out-of-n bootstrap procedure to estimate the unknown scaling factor

in the asymptotic distribution. Extensive simulation studies support our theory,

and the proposed procedures are applied to a dataset from a clinical trial on

advanced colorectal cancer.

Key words and phrases: Probability index, Predictive classification, Wilcoxon- type statistics, Bootstrap, U-processes, Chernoff’s distribution 1

Information

Preprint No.	SS-2025-0076
Manuscript ID	SS-2025-0076
Complete Authors	Zehui Wang, Yanglei Song, Wenyu Jiang, Dongsheng Tu
Corresponding Authors	Dongsheng Tu
Emails	dtu@ctg.queensu.ca

References

Baklizi, A. and O. Eidous (2006). Nonparametric estimation of P(X
Ballman, K. V. (2015). Biomarker: Predictive or Prognostic? Journal of Clinical Oncology 33, 3968–3971.
Bickel, P. J., F. G¨otze, and W. R. van Zwet (1997). Resampling fewer than n observations: Gains, losses, and remedies for losses. Statistica Sinica 7, 1–31.
Bickel, P. J. and A. Sakov (2008, 01). On the choice of m in the m out of n bootstrap and confidence bounds for extrema. Statistica Sinica 18, 967–985.
Birnbaum, Z. W. (1956). On a Use of the Mann-Whitney Statistic. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics 3.1, 13–18.
Cattaneo, M. D., M. Jansson, and K. Nagasawa (2020). Bootstrap-based inference for cube root asymptotics. Econometrica 88, 2203–2219.
Chernoff, H. (1964, 12). Estimation of the mode. Annals of the Institute of Statistical Mathematics 16, 31–41.
Chernozhukov, V., D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins (2018, 01). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal 21, C1–C68.
De Schryver, M. and J. De Neve (2019, 08). A tutorial on probabilistic index models: Regression models for the effect size p(y1 ¡ y2). Psychological Methods 24, 403–418.
Durrett, R. (2019). Probability: theory and examples, Volume 49. Cambridge university press.
Faraggi, D. and R. Simon (1996). A simulation study of cross-validation for selecting an optimal cutpoint in univariate survival analysis. Statistics in Medicine 15, 2203–2213.
Fokianos, K. and J. F. Troendle (2007). Inference for the relative treatment effect with the density ratio model. Statistical Modelling 7, 155–173.
Gavanji, P., B. E. Chen, and W. Jiang (2017). Residual bootstrap test for interactions in biomarker threshold models with survival data. Statistics in Biosciences 10, 202–216.
Grenander, U. (1956). On the theory of mortality measurement. Scandinavian Actuarial Journal 1956, 125–153.
Groeneboom, P. and J. A. Wellner (2001). Computing chernoff’s distribution. Journal of Computational and Graphical Statistics 10, 388–400.
He, Y., H. Lin, and D. Tu (2018). A single-index threshold cox proportional hazard model for identifying a treatment-sensitive subset based on multiple biomarkers. Statistics in Medicine 37, 3267–3279.
Hilsenbeck, S. G. and G. M. Clark (1996). Practical p-value adjustment for optimally selected cutpoints. Statistics in Medicine 15, 103–112.
Hilsenbeck, S. G., G. M. Clark, and W. L. McGuire (1992). Why do so many prognostic factors fail to pan out? Breast Cancer Research and Treatment 22, 197–206.
Jiang, S., B. E. Chen, and D. Tu (2016). Inference on treatment-covariate interaction based on a nonparametric measure of treatment effects and censored survival data. Statistics in Medicine 35, 2715–2725.
Jiang, W., B. Freidlin, and R. Simon (2007). Biomarker-Adaptive Threshold Design: A Procedure for Evaluating Treatment With Possible Biomarker-Defined Subset Effect. JNCI: Journal of the National Cancer Institute 99, 1036–1043.
Jonker, D. J., C. S. Karapetis, et al. (2013). Epiregulin gene expression as a biomarker of benefit from cetuximab in the treatment of advanced colorectal cancer. British Journal of Cancer 110, 648–655.
Karapetis, C. S. et al. (2008). K-ras mutations and benefit from cetuximab in advanced colorectal cancer. The New England journal of medicine 359, 1757–65.
Kim, J. and D. Pollard (1990). Cube root asymptotics. The Annals of Statistics 18, 191–219.
Kotz, S., Y. Lumelskii, and M. Pensky (2003). The Stress–Strength Model And Its Generalizations: Theory and Applications. WORLD SCIENTIFIC.
Koziol, J. A. and Z. Jia (2009). The concordance index c and the mann-whitney parameter pr(x>y) with randomly censored data. Biometrical Journal 51, 467–474.
Li, N., Y. Song, C. D. Lin, and D. Tu (2023a). Bootstrap adjusted predictive classification for identification of subgroups with differential treatment effects under generalized linear models. Electronic Journal of Statistics 17(1), 548 – 606.
Li, N., Y. Song, D. Lin, and D. Tu (2023b). Bootstrap adjustment to minimum p-value method for predictive classification. Statistica Sinica 33, 2065–2086.
Luo, X. and J. M. Boyett (1997). Estimations of a threshold parameter in cox regression. Communications in Statistics - Theory and Methods 26, 2329–2346.
Mann, H. B. and D. R. Whitney (1947). On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics 18, 50–60.
Manski, C. F. (1975, 08). Maximum score estimation of the stochastic utility model of choice. Journal of Econometrics 3, 205–228.
Mazumdar, M. and J. R. Glassman (2000). Categorizing a prognostic variable: review of methods, code for easy implementation and applications to decision-making about cancer treatments. Statistics in Medicine 19, 113–132.
Mccullagh, P. and J. A. Nelder (1989). Generalized Linear Models. CRC Press, Boca Raton.
Mi, X., P. Tighe, F. Zou, and B. Zou (2021). A deep learning semiparametric regression for adjusting complex confounding structures. The Annals of Applied Statistics 15, 1086–1100.
Moser, B. K. and M. H. McCann (2008). Reformulating the hazard ratio to enhance communication with clinical investigators. Clinical Trials: Journal of the Society for Clinical Trials 5, 248–252.
Patel, K. M. and D. G. Hoel (1973). A nonparametric test for interaction in factorial experiments. Journal of the American Statistical Association 68, 615–620.
Pe˜na, V. H. and E. Gin´e (1999). Decoupling: From Dependence to Independence. Springer.
Politis, D. N. and J. P. Romano (1994). Large sample confidence regions based on subsamples under minimal assumptions. The Annals of Statistics 22, 2031–2050.
Pons, O. (2003). Estimation in a cox regression model with a change-point according to a threshold in a covariate. The Annals of Statistics 31, 442–463.
P´eron, J., P. Roy, B. Ozenne, L. Roche, and M. Buyse (2016). The net chance of a longer survival as a patient-oriented measure of treatment benefit in randomized clinical trials. JAMA Oncology 2, 901.
Seijo, E. and B. Sen (2011). Change-point in stochastic design regression and the bootstrap. The Annals of Statistics 39, 1580–1607.
Sen, P. K. (1967). A Note on Asymptotically Distribution-Free Confidence Bounds for P{X < Y}, Based on Two Independent Samples. Sankhy¯a: The Indian Journal of Statistics, Series A 29, 95–102.
Shao, J. and D. Tu (1995). The Jackknife and Bootstrap. Springer Nature.
Thas, O., J. D. Neve, L. Clement, and J.-P. Ottoy (2012, 07). Probabilistic index models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74, 623–671.
Van der Vaart, A. W. and J. A. Wellner (1996). Weak convergence and empirical processes: With Applications to Statistics. Springer New York.
Wilcoxon, F. (1945). Individual Comparisons by Ranking Methods. Biometrics Bulletin 1, 80–83.
Xu, G., B. Sen, and Z. Ying (2014). Bootstrapping a change-point cox model for survival data. Electronic Journal of Statistics 8, 1345–1379.

Supplementary Materials

The proofs and more simulation results are presented in the SMs.

Supplementary materials are available for download.

[1] Baklizi, A. and O. Eidous (2006). Nonparametric estimation of P(X

[2] Ballman, K. V. (2015). Biomarker: Predictive or Prognostic? Journal of Clinical Oncology 33, 3968–3971.

[3] Bickel, P. J., F. G¨otze, and W. R. van Zwet (1997). Resampling fewer than n observations: Gains, losses, and remedies for losses. Statistica Sinica 7, 1–31.

[4] Bickel, P. J. and A. Sakov (2008, 01). On the choice of m in the m out of n bootstrap and confidence bounds for extrema. Statistica Sinica 18, 967–985.

[5] Birnbaum, Z. W. (1956). On a Use of the Mann-Whitney Statistic. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics 3.1, 13–18.

[6] Cattaneo, M. D., M. Jansson, and K. Nagasawa (2020). Bootstrap-based inference for cube root asymptotics. Econometrica 88, 2203–2219.

[7] Chernoff, H. (1964, 12). Estimation of the mode. Annals of the Institute of Statistical Mathematics 16, 31–41.

[8] Chernozhukov, V., D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins (2018, 01). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal 21, C1–C68.

[9] De Schryver, M. and J. De Neve (2019, 08). A tutorial on probabilistic index models: Regression models for the effect size p(y1 ¡ y2). Psychological Methods 24, 403–418.

[10] Durrett, R. (2019). Probability: theory and examples, Volume 49. Cambridge university press.

[11] Faraggi, D. and R. Simon (1996). A simulation study of cross-validation for selecting an optimal cutpoint in univariate survival analysis. Statistics in Medicine 15, 2203–2213.

[12] Fokianos, K. and J. F. Troendle (2007). Inference for the relative treatment effect with the density ratio model. Statistical Modelling 7, 155–173.

[13] Gavanji, P., B. E. Chen, and W. Jiang (2017). Residual bootstrap test for interactions in biomarker threshold models with survival data. Statistics in Biosciences 10, 202–216.

[14] Grenander, U. (1956). On the theory of mortality measurement. Scandinavian Actuarial Journal 1956, 125–153.

[15] Groeneboom, P. and J. A. Wellner (2001). Computing chernoff’s distribution. Journal of Computational and Graphical Statistics 10, 388–400.

[16] He, Y., H. Lin, and D. Tu (2018). A single-index threshold cox proportional hazard model for identifying a treatment-sensitive subset based on multiple biomarkers. Statistics in Medicine 37, 3267–3279.

[17] Hilsenbeck, S. G. and G. M. Clark (1996). Practical p-value adjustment for optimally selected cutpoints. Statistics in Medicine 15, 103–112.

[18] Hilsenbeck, S. G., G. M. Clark, and W. L. McGuire (1992). Why do so many prognostic factors fail to pan out? Breast Cancer Research and Treatment 22, 197–206.

[19] Jiang, S., B. E. Chen, and D. Tu (2016). Inference on treatment-covariate interaction based on a nonparametric measure of treatment effects and censored survival data. Statistics in Medicine 35, 2715–2725.

[20] Jiang, W., B. Freidlin, and R. Simon (2007). Biomarker-Adaptive Threshold Design: A Procedure for Evaluating Treatment With Possible Biomarker-Defined Subset Effect. JNCI: Journal of the National Cancer Institute 99, 1036–1043.

[21] Jonker, D. J., C. S. Karapetis, et al. (2013). Epiregulin gene expression as a biomarker of benefit from cetuximab in the treatment of advanced colorectal cancer. British Journal of Cancer 110, 648–655.

[22] Karapetis, C. S. et al. (2008). K-ras mutations and benefit from cetuximab in advanced colorectal cancer. The New England journal of medicine 359, 1757–65.

[23] Kim, J. and D. Pollard (1990). Cube root asymptotics. The Annals of Statistics 18, 191–219.

[24] Kotz, S., Y. Lumelskii, and M. Pensky (2003). The Stress–Strength Model And Its Generalizations: Theory and Applications. WORLD SCIENTIFIC.

[25] Koziol, J. A. and Z. Jia (2009). The concordance index c and the mann-whitney parameter pr(x>y) with randomly censored data. Biometrical Journal 51, 467–474.

[26] Li, N., Y. Song, C. D. Lin, and D. Tu (2023a). Bootstrap adjusted predictive classification for identification of subgroups with differential treatment effects under generalized linear models. Electronic Journal of Statistics 17(1), 548 – 606.

[27] Li, N., Y. Song, D. Lin, and D. Tu (2023b). Bootstrap adjustment to minimum p-value method for predictive classification. Statistica Sinica 33, 2065–2086.

[28] Luo, X. and J. M. Boyett (1997). Estimations of a threshold parameter in cox regression. Communications in Statistics - Theory and Methods 26, 2329–2346.

[29] Mann, H. B. and D. R. Whitney (1947). On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics 18, 50–60.

[30] Manski, C. F. (1975, 08). Maximum score estimation of the stochastic utility model of choice. Journal of Econometrics 3, 205–228.

[31] Mazumdar, M. and J. R. Glassman (2000). Categorizing a prognostic variable: review of methods, code for easy implementation and applications to decision-making about cancer treatments. Statistics in Medicine 19, 113–132.

[32] Mccullagh, P. and J. A. Nelder (1989). Generalized Linear Models. CRC Press, Boca Raton.

[33] Mi, X., P. Tighe, F. Zou, and B. Zou (2021). A deep learning semiparametric regression for adjusting complex confounding structures. The Annals of Applied Statistics 15, 1086–1100.

[34] Moser, B. K. and M. H. McCann (2008). Reformulating the hazard ratio to enhance communication with clinical investigators. Clinical Trials: Journal of the Society for Clinical Trials 5, 248–252.

[35] Patel, K. M. and D. G. Hoel (1973). A nonparametric test for interaction in factorial experiments. Journal of the American Statistical Association 68, 615–620.

[36] Pe˜na, V. H. and E. Gin´e (1999). Decoupling: From Dependence to Independence. Springer.

[37] Politis, D. N. and J. P. Romano (1994). Large sample confidence regions based on subsamples under minimal assumptions. The Annals of Statistics 22, 2031–2050.

[38] Pons, O. (2003). Estimation in a cox regression model with a change-point according to a threshold in a covariate. The Annals of Statistics 31, 442–463.

[39] P´eron, J., P. Roy, B. Ozenne, L. Roche, and M. Buyse (2016). The net chance of a longer survival as a patient-oriented measure of treatment benefit in randomized clinical trials. JAMA Oncology 2, 901.

[40] Seijo, E. and B. Sen (2011). Change-point in stochastic design regression and the bootstrap. The Annals of Statistics 39, 1580–1607.

[41] Sen, P. K. (1967). A Note on Asymptotically Distribution-Free Confidence Bounds for P{X < Y}, Based on Two Independent Samples. Sankhy¯a: The Indian Journal of Statistics, Series A 29, 95–102.

[42] Shao, J. and D. Tu (1995). The Jackknife and Bootstrap. Springer Nature.

[43] Thas, O., J. D. Neve, L. Clement, and J.-P. Ottoy (2012, 07). Probabilistic index models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74, 623–671.

[44] Van der Vaart, A. W. and J. A. Wellner (1996). Weak convergence and empirical processes: With Applications to Statistics. Springer New York.

[45] Wilcoxon, F. (1945). Individual Comparisons by Ranking Methods. Biometrics Bulletin 1, 80–83.

[46] Xu, G., B. Sen, and Z. Ying (2014). Bootstrapping a change-point cox model for survival data. Electronic Journal of Statistics 8, 1345–1379.