Abstract

In clinical studies, assessing statistical associations between covariates and survival

outcomes is crucial.

To date, there has been no formally defined model-free correlation

coefficient for right-censored data that can measure the strength of associations. Traditional

methods, such as the Cox proportional hazards model, often struggle with the complexities

of non-monotonic or nonlinear relationships. This paper introduces a censored rank-based

correlation coefficient (CRC). It consistently estimates a new dependence measure—taking

values in [0, 1] and equaling 0 or 1 if and only if the variables are independent or one is a

measurable function of the other. The CRC is entirely model-free without depending on the

distributions of the variables. It facilitates quick computation with a complexity of O(n log n)

and can effectively detect nonlinear and non-monotonic effects, even under heavy censoring.

The p-values for testing independence can be obtained using a power-consistent permutation

method. The CRC shows strong consistency and asymptotic normality, outperforming the

Cox model and other methods in detecting nonlinear associations in both simulations and real

data from the Alzheimer’s Disease Neuroimaging Initiative, successfully identifying proteins

that existing methods fail to detect.

Key words and phrases: Chatterjee’s rank correlation; Nonparametric correlation; Right- censoring; Permutation test

Information

Preprint No.SS-2025-0207
Manuscript IDSS-2025-0207
Complete AuthorsLinlin Dai, Tengfei Li, Kani Chen
Corresponding AuthorsKani Chen
Emailsmakchen@ust.hk

References

  1. Azadkia, M. and Chatterjee, S. (2021). A simple measure of conditional dependence. The Annals of Statistics, 49, 3070–3102.
  2. Bhaskaran, K., dos-Santos-Silva, I., Leon, D. A., Douglas, I. J. and Smeeth, L. (2018). Association of BMI with overall and cause-specific mortality: a population-based cohort study of 3.6 million adults in the UK. The lancet Diabetes & endocrinology, 6, 944–953.
  3. Biau, G. and Devroye, L. (2015). Lectures on the Nearest Neighbor Method, volume 246. Springer.
  4. Bickel, P. J.
  5. (2022). Measures of independence and functional dependence. https://arxiv.org/abs/2206.13663.
  6. Bitouz´e, D., Laurent, B. and Massart, P. (1999). A Dvoretzky-Kiefer-Wolfowitz type inequality for the Kaplan-Meier estimator. Annales de l’Institut Henri Poincar´e (B) Probability and Statistics, 35, 735– 763.
  7. Breiman, L. and Friedman, J. H. (1985). Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association, 80, 580–598.
  8. Chatterjee, S. (2008). A new method of normal approximation. The Annals of Probability, 36, 1584–1610.
  9. Chatterjee, S. (2021). A new coefficient of correlation. Journal of the American Statistical Association, 116, 2009–2022.
  10. Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 34, 187–220.
  11. Deb, N., Ghosal, P. and Sen, B. (2020). Measuring association on topological spaces using kernels and geometric graphs. Available at arXiv:2010.01768.
  12. Dette, H., Siburg, K. F. and Stoimenov, P. A. (2013). A copula-based non-parametric measure of regression dependence. Scandinavian Journal of Statistics, 40, 21–41.
  13. Dette, H. and Kroll, M. (2025). A simple bootstrap for Chatterjee’s rank correlation. Biometrika, 112, asae045.
  14. Echeverria, V., E. Barreto, G., ´Avila-Rodriguezc, M., V. Tarasov, V. and Aliev, G.
  15. (2017). Is VEGF a key target of cotinine and other potential therapies against Alzheimer disease? Current Alzheimer Research, 14, 1155–1163.
  16. Edelmann, D., Welchowski, T. and Benner, A. (2022). A consistent version of distance covariance for rightcensored survival data and its application in hypothesis testing. Biometrics, 78, 867–879.
  17. Fern´andez, T., Gretton, A., Rindt, D. and Sejdinovic, D. (2023). A kernel log-rank test of independence for right-censored data. Journal of the American Statistical Association, 118, 925–936.
  18. Fleming, T. R. and Harrington, D. P. (2013). Counting processes and survival analysis (Vol. 625). John Wiley & Sons.
  19. Gambassi, G., Landi, F., Lapane, K. L., Sgadari, A., Mor, V. and Bernabei, R. (1999). Predictors of mortality in patients with Alzheimer’s disease living in nursing homes. Journal of Neurology, Neurosurgery & Psychiatry, 67, 59–65.
  20. Gray, R. J. (1992). Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. Journal of the American Statistical Association, 87, 942–951.
  21. Gretton, A., Bousquet, O., Smola, A. and Sch¨olkopf, B. (2005). Measuring statistical dependence with Hilbert-Schmidt norms. in Algorithmic Learning Theory, Berlin: Springer, pp. 63–77.
  22. Gretton, A., Fukumizu, K., Teo, C., Song, L., Sch¨olkopf, B. and Smola, A. (2008). A kernel statistical test of independence. Advances in Neural Information Processing Systems, 585–592.
  23. Cruchaga, C., Ali, M., Shen, Y., Do, A., Wang, L., Western, D. et al. (2024). Multi-cohort cerebrospinal fluid proteomics identifies robust molecular signatures for asymptomatic and symptomatic Alzheimer’s disease. Research Square, rs-3.
  24. He, X., Wang, L. and Hong, H. G. (2013). Quantile-adaptive model-free variable screening for highdimensional heterogeneous data. The Annals of Statistics, 41, 342–369.
  25. Le, C. T., Grambsch, P. M. and Louis, T. A. (1994). Association between survival time and ordinal covariates. Biometrics, 50, 213–219.
  26. Li, J., Zheng, Q., Peng, L. and Huang, Z. (2016). Survival impact index and ultrahigh-dimensional modelfree screening with survival outcomes. Biometrics, 72, 1145–1154.
  27. Lin, Z. and Han, F. (2022). Limit theorems of Chatterjee’s rank correlation. Available at arXiv:2204.08031.
  28. Lin, Z. and Han, F. (2023). On boosting the power of Chatterjee’s rank correlation. Biometrika, 110, 283– 299.
  29. Lin, Z. and Han, F. (2024). On the failure of the bootstrap for Chatterjee’s rank correlation. Biometrika, 111, 1063–1070.
  30. Ma, L. and Mao, J. (2019). Fisher exact scanning for dependency. Journal of the American Statistical Association, 114, 245–258.
  31. McKeague, I. W., Nikabadze, A. M. and Sun, Y. Q. (1995). An omnibus test for independence of a survival time from a covariate. The Annals of Statistics, 23, 450–475.
  32. Pan, W., Wang, X., Xiao, W. and Zhu, H. (2019). A generic sure independence screening procedure. Journal of the American Statistical Association, 114, 928–937.
  33. Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., McVean, G.,Turnbaugh, P. J., Lander, E.
  34. S., Mitzenmacher, M. and Sabeti, P. (2011). Detecting novel associations in large datasets. Science, 334, 1518–1524.
  35. Rindt, D., Sejdinovic, D. and Steinsaltz, D. (2021). A kernel-and optimal transport-based test of independence between covariates and right censored lifetimes. The International Journal of Biostatistics, 17, 331–348.
  36. Schwartz, M., Arad, M. and Ben-Yehuda, H. (2019). Potential immunotherapy for Alzheimer disease and age-related dementia. Dialogues in Clinical Neuroscience, 21, 21–25.
  37. Sharma, N., Chakrabarti, A. and Balas, V. E. (2019). Data management, analytics and innovation. Proceedings of ICDMAI, 1.
  38. Shen, Y., Timsina, J., Heo, G., Beric, A., Ali, M., Wang, C. et al. (2024). CSF proteomics identifies early changes in autosomal dominant Alzheimer’s disease. Cell, 187, 6309–6326.
  39. Shi, H., Drton, M. and Han, F. (2024). On Azadkia-Chatterjee’s conditional dependence coefficient. Bernoulli, 30, 851–877.
  40. Song, R., Lu, W., Ma, S. and Jeng, X. J. (2014). Censored rank independence screening for highdimensional survival data. Biometrika, 101, 799–814.
  41. Sz´ekely, G. J. and Rizzo, M. L. (2009). Brownian distance covariance. The Annals of Applied Statistics, 3, 1236–1265.
  42. Sz´ekely, G. J., Rizzo, M. L. and Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35, 2769–2794.
  43. Zajacova, A. and Burgard, S. A. (2012). Shape of the BMI-mortality association by cause of death, using generalized additive models: NHIS 1986–2006. Journal of aging and health, 24, 191–211.
  44. Zucker, D. M. and Karr, A. F. (1990). Nonparametric survival analysis with time-dependent covariate effects: a penalized partial likelihood approach. The Annals of Statistics, 18, 329–353. Linlin Dai

Acknowledgments

Data used in this paper were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within

the ADNI contributed to the design and implementation of ADNI and/or provided

data but did not participate in analysis or writing of this paper. A complete listing

of ADNI investigators can be found at http://adni.loni.usc.edu/wp-content/

uploads/how_to_apply/ADNI_Acknowledgement_List.pdf. Linlin Dai and Tengfei

Li contributed equally. Linlin Dai’s research was supported by the General Project of

Ministry of Education Foundation on Humanities and Social Sciences (Nos. 24YJC910001)

and the Sichuan Provincial Natural Science Foundation Project (Nos.

2025ZNS-

FSC0816). Kani Chen’s research was supported by the grants T32-615-24-R and

−2

−1

−2

−1

Log Hazard Ratio

Log Hazard Ratio

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

Normalized B7H1

Normalized VEGF

Figure 2: Nonlinear hazard function estimates for associations between survival time

and protein levels of B7H1 and VEGF in the Alzheimer’s Disease Neuroimaging

Initiative (ADNI) cohort. Protein levels were normalized to the interval [0, 1] for

visualizations.

Supplementary Materials

present technical proofs and additional simulation and

real data results.


Supplementary materials are available for download.