Statistica Sinica 26 (2016), 493-508 doi:http://dx.doi.org/10.5705/ss.2014.194
Abstract: For survival data with high-dimensional covariates, results generated in the analysis of a single dataset are often unsatisfactory because of the small sample size. Integrative analysis pools raw data from multiple independent studies with comparable designs, effectively increases sample size, and has better performance than meta-analysis and single-dataset analysis. In this study, we conduct integrative analysis of survival data under the accelerated failure time (AFT) model. The sparsity structures of multiple datasets are described using homogeneity and heterogeneity models. For variable selection under the homogeneity model, we adopt group penalization approaches; for variable selection under the heterogeneity model, we use composite penalization and sparse group penalization approaches. As a major advancement from existing studies, the asymptotic selection and estimation properties are rigorously established. Simulation study is conducted to compare different penalization methods and against alternatives. We also analyze four lung cancer prognosis datasets with gene expression measurements.
Key words and phrases: Consistency properties, homogeneity and heterogeneity models, integrative analysis, penalized selection.