Back To Index Previous Article Next Article Full Text

Statistica Sinica 34 (2024), 1483-1503

SCALABLE ESTIMATION FOR HIGH VELOCITY
SURVIVAL DATA ABLE TO ACCOMMODATE
ADDITION OF COVARIATES

Ying Sheng1, Yifei Sun2, Charles E. McCulloch3 and Chiung-Yu Huang*3

1Chinese Academy of Sciences, 2Columbia University and 3University of California at San Francisco

Abstract: With the rapidly increasing availability of large-scale streaming data, there is growing interest in methods that process data in batches without requiring storage of the full data set. In this paper, we propose a hybrid likelihood approach for scalable estimation of the Cox model using individual-level data in the current data batch and summary statistics calculated from historical data. We show that the proposed scalable estimator is asymptotically as efficient as the maximum likelihood estimator calculated using the full data set with low data storage requirements and low loading and computation time. A difficulty with analyzing batches of survival data that is not accommodated in extant methods is that new covariates may become available midway through data collection. To accommodate addition of covariates, we develop a hybrid empirical likelihood approach that incorporates the historical covariate effects evaluated using a reduced Cox model. The extended scalable estimator is asymptotically more efficient than the maximum likelihood estimator obtained using only the data batches that include the additional covariates. The proposed approaches are evaluated using numerical simulations and illustrated with an analysis of Surveillance, Epidemiology, and End Results breast cancer data.

Key words and phrases: Batch processing, hybrid empirical likelihood, scalable estimation.

Back To Index Previous Article Next Article Full Text