Back To Index Previous Article Next Article Full Text

Statistica Sinica 34 (2024), 1045-1066

INTEGRATING INCOMPLETE DATA
FOR MEDIATION ANALYSIS

Andriy Derkach*1, Joshua N. Sampson2 and Ruth M. Pfeiffer2

1Memorial Sloan-Kettering Cancer Center and 2National Cancer Institute

Abstract: Abstract: Mediation analysis examines the relationships between an exposure, a mediator, and an outcome. Although many approaches are available for performing such analyses they all require access to a single complete data set that contains the three key variables: outcome, exposure, and mediator. Here, we propose semiparametric methods for mediation analysis to estimate the standard causal parameters (direct and indirect effects) by combining information from several incomplete data sets, each containing only two of the three key variables. Importantly, our methods also handle scenarios in which only summary statistics based on those data sets are available. The resulting estimates of the causal parameters are asymptotically unbiased and normally distributed. We evaluate the performance of our methods in finite samples using simulations, and quantify the loss in efficiency from the lack of a complete data set with all three variables. We then apply proposed method to determine whether the number of terminal duct lobular units in the breast mediate the relationship between a polygenic risk score and breast cancer risk.

Key words and phrases: Data integration, direct and indirect effects, semiparametric likelihood, summary level information.

Back To Index Previous Article Next Article Full Text