Back To Index Previous Article Next Article Full Text

Statistica Sinica 31 (2021), 1285-1307

DOUBLY ROBUST REGRESSION ANALYSIS
FOR DATA FUSION

Katherine Evans, BaoLuo Sun, James Robins and Eric J. Tchetgen Tchetgen

Verily Life Sciences, National University of Singapore,
Harvard T.H. Chan School of Public Health and The Wharton School of the University of Pennsylvania

Abstract: This study investigates parametric inferences for the regression of an outcome variable Y on covariates (V, L). Here, the data are fused from two separate sources, one of which contains information only on (V, Y ), while the other contains information only on the covariates. This setting may be viewed as an extreme form of missing data in which the probability of observing complete data (V, L, Y ) on any given subject is zero. We develop a large class of semiparametric estimators, including doubly robust estimators, of the regression coefficients in the fused data. The proposed method is doubly robust in that it is consistent and asymptotically normal if, in addition to the model of interest, we correctly specify a model for either the data source process under an ignorability assumption, or the distribution of the unobserved covariates. We evaluate the performance of our estimators using an extensive simulation study. Then, we apply the proposed methods to investigate the relationship between net asset value and total expenditure among U.S. households in 1998, while controlling for potential confounders, including income and other demographic variables.

Key words and phrases: Data fusion, doubly robust.

Back To Index Previous Article Next Article Full Text