Statistica Sinica 33 (2023), 1985-2016

PRINCIPAL COMPONENTS ANALYSIS

FOR RIGHT CENSORED DATA

Benjamin W. Langworthy, Jianwen Cai, Robert W. Corty, Michael R. Kosorok and Jason P. Fine

University of North Carolina at Chapel Hill

Abstract: Principal components analysis (PCA) is a common dimension-reduction tool that transforms a set of variables into a linearly uncorrelated set of variables. Standard PCA estimators involve either the eigendecomposition of the estimated covariance matrix or a singular value decomposition of the centered data. However, for right-censored failure time data, estimating the principal components in this way is not straightforward because not all failure times are observed. Standard estimators for the covariance or orrelation matrix should not be used in this case, because they require strong assumptions on the form of the joint distribution and on the marginal distributions beyond the final observation time. We present a novel, nonparametric estimator for the covariance of multivariate right-censored failure time data based on the counting processes and corresponding martingales defined by the failure times. We prove that these estimators are consistent and converge to a Gaussian process when properly standardized. We further show that these covariance estimates can be used to estimate a PCA for the martingales and counting processes for the different failure times. The corresponding estimates of the principal directions are consistent and asymptotically normal. We apply this method to data from a clinical trial of patients with pancreatic cancer, and recover a medically valid low-dimensional representation of adverse events.

Key words and phrases: Competing risks, multivariate survival analysis, principal components analysis.