Abstract
Motivated by an international breast cancer study, we investigate regression analysis of clustered interval-censored failure time data in the presence of random change
points. Although a large literature has been developed for regression analysis of clustered or interval-censored data, there does not seem to exist an established approach
for the situation considered here. Change points can occur in many situations and one
example is when a disease risk may shift abruptly once certain biological markers cross
critical thresholds. For the problem, we propose a sieve maximum likelihood estimation procedure that can accommodate all three features, clustered structure, interval
censoring and random change points. For the implementation of the proposed method,
an EM algorithm is developed and the asymptotic properties of the resulting estimators are established. An extensive simulation study is conducted and indicates that
the proposed method works well in practical situations. The proposed methodology is
applied to the aforementioned breast cancer study.
Key words and phrases: Clustered interval-censored data; EM algorithm; Random change point; Sieve estimation
Information
| Preprint No. | SS-2025-0305 |
|---|---|
| Manuscript ID | SS-2025-0305 |
| Complete Authors | Yichen Lou, Mingyue Du, Jianguo Sun |
| Corresponding Authors | Jianguo Sun |
| Emails | suncolumbia@163.com |
References
- Chen, M.-H., Tong, X., and Sun, J. (2009). A frailty model approach for regression analysis of multivariate current status data. Statistics in Medicine, 28(27):3424–3436.
- Chen, X., Fan, Y., and Tsyrennikov, V. (2006). Efficient estimation of semiparametric multivariate copula models. Journal of the American Statistical Association, 101(475):1228–1240.
- Chen, X., Ping, Y., and Sun, J. (2024). Efficient estimation of cox model with random change point. Statistics in Medicine, 43(6):1213–1226.
- Dall, G. V. and Britt, K. L. (2017). Estrogen effects on the mammary gland in early and late life and breast cancer risk. Frontiers in Oncology, 7:110.
- Du, M., Lou, Y., and Sun, J. (2025). Estimation and variable selection for interval-censored failure time data with random change point and application to breast cancer study. Journal of the American Statistical Association, 0(0):1–12.
- Du, M. and Zhao, X. (2024). A conditional approach for regression analysis of case k intervalcensored failure time data with informative censoring. Computational Statistics & Data Analysis, 198:107991.
- Hanagal, D. D. (2011). Modeling Survival Data using Frailty Models. Springer.
- He, X., Lin, H., and Tu, D. (2018). A single-index threshold cox proportional hazard model for identifying a treatment-sensitive subset based on multiple biomarkers. Statistics in Medicine, 37 3267 3279
- Heer, E., Harper, A., Escandor, N., Sung, H., McCormack, V., and Fidler-Benaoudia, M. M.
- (2020). Global burden and trends in premenopausal and postmenopausal breast cancer: a population-based study. The Lancet Global Health, 8(8):e1027–e1037.
- Hougaard, P. (2000). Analysis of Multivariate Survival Data. Springer.
- Huang, R., Sun, L., and Xiang, L. (2024). Conditional quasi-likelihood inference for mean residual life regression with clustered failure time data. Scandinavian Journal of Statistics, 51(4):1685– 1706.
- Huang, R., Xiang, L., and Ha, I. D. (2019). Frailty proportional mean residual life regression for clustered survival data: A hierarchical quasi-likelihood method. Statistics in Medicine, 38(24):4854–4870.
- Lam, K., Xu, Y., and Cheung, T.-L. (2010). A multiple imputation approach for clustered intervalcensored survival data. Statistics in Medicine, 29(6):680–693.
- Lee, C. Y. and Lam, K. (2020). Survival analysis with change-points in covariate effects. Statistical Methods in Medical Research, 29(11):3235–3248.
- Lee, C. Y. and Wong, K. Y. (2023). Survival analysis with a random change-point. Statistical Methods in Medical Research, 32(11):2083–2095.
- Lorentz, G. G. (1986). Bernstein Polynomials. New York: Chelsea Publishing Co.
- Lou, Y., Du, M., and Song, X. (2025). Regression analysis of interval-censored failure time data with change points and a cured subgroup. Biometrics, 81(3):ujaf100.
- Lou, Y., Sun, J., and Wang, P. (2024). Semiparametric cure regression models with informative case k interval-censored failure time data. Statistica Sinica.
- Ma, L., Hu, T., and Sun, J. (2015). Sieve maximum likelihood regression analysis of dependent current status data. Biometrika, 102(3):731–738.
- Pons, O. (2003). Estimation in a cox regression model with a change-point according to a threshold in a covariate. The Annals of Statistics, 31(2):442–463.
- Redig, A. J. and McAllister, S. S. (2013). Breast cancer as a systemic disease: a view of metastasis.
- Shen, X. (1997). On methods of sieves and penalization. The Annals of Statistics, 25(6):2555–2591.
- Shen, X. and Wong, W. H. (1994). Convergence rate of sieve estimates. The Annals of Statistics, 24(2):580–615.
- Siegel, R. L., Giaquinto, A. N., and Jemal, A. (2024). Cancer statistics, 2024. CA: A Cancer Journal for Clinicians.
- Stavraky, K. and Emmons, S. (1974). Breast cancer in premenopausal and postmenopausal women. Journal of the National Cancer Institute, 53(3):647–654.
- Sun, J. (2006). The Statistical Analysis of Interval-censored Failure Time Data. Springer.
- Sun, J. and Chen, D.-G. (2022). Emerging Topics in Modeling Interval-Censored Survival Data. Springer.
- van der Vaart, A. and Wellner, J. (1996). Weak Convergence and Empirical Processes: with Applications to Statistics. Springer Science & Business Media.
- Yang, D., Du, M., and Sun, J. (2021). Semiparametric regression analysis of clustered intervalcensored failure time data with a cured subgroup. Statistics in Medicine, 40(30):6918–6930.
- Zhou, Q., Hu, T., and Sun, J. (2017). A sieve semiparametric maximum likelihood approach for regression analysis of bivariate interval-censored failure time data. Journal of the American Statistical Association, 112(518):664–672. Appendix: Proofs of Asymptotic Properties. In this appendix, we will sketch the proofs of the asymptotic properties of bθ given in Section 3 using the empirical process theory, and for this, we will first define some more notation and give some regularity conditions. For any bθ(1) = ( b ψ(1), bΛ(1)) and bθ(2) = ( b ψ(2), bΛ(2)), define the distance d(bθ(1), bθ(2)) =
Acknowledgments
The authors wish to thank the Co-Editor, Dr. John Stufken, an Associate Editor, and a reviewer for
their many insightful and valuable comments and suggestions that greatly improved the article. The
research was partially supported by the Scientific Research Project of the Education Department
of Jilin Province (Grant No. JJKH20261483KJ to Du).