Abstract
Instrumental variable approaches have gained popularity for estimating
causal effects in the presence of unmeasured confounders. However, the availability of instrumental variables in the primary dataset is often challenged due
to stringent and untestable assumptions. This paper presents a novel method to
identify and estimate causal effects by utilizing instrumental variables from the
auxiliary dataset, incorporating a structural equation model, even in scenarios
with nonlinear treatment effects. Our approach involves using two datasets: one
called the primary dataset with joint observations of treatment and outcome, and
another auxiliary dataset providing information about the instrument and treatment. Our strategy differs from most existing methods by not depending on the
simultaneous measurements of instrument and outcome.
The central idea for
identifying causal effects is to establish a valid substitute through the auxiliary
dataset, addressing unmeasured confounders. This is achieved by developing a
control function and projecting it onto the function space spanned by the treatment variable.
We then propose a three-step estimator for estimating causal
effects and derive its asymptotic results.
We illustrate the proposed estimator
through simulation studies, and the results demonstrate favorable performance.
We also conduct a real data analysis to evaluate the causal effect between vitamin
D status and body mass index.
Key words and phrases: Control function, Data fusion, Instrumental variable, Unmeasured confounder
Information
| Preprint No. | SS-2024-0006 |
|---|---|
| Manuscript ID | SS-2024-0006 |
| Complete Authors | Kang Shuai, Shanshan Luo, Wei Li, Yangbo He |
| Corresponding Authors | Shanshan Luo |
| Emails | shanshanluo@btbu.edu.cn |
References
- Joshua Angrist, Ivan Fernandez-Val, Daron Acemoglu, Manuel Arellano, and D Eddie. Advances in Economics and Econometrics. Cambridge University Press, 2013.
- Joshua D Angrist and Alan B Krueger. The effect of age at school entry on educational attainment: an application of instrumental variables with moments from two samples. Journal of the American Statistical Association, 87:328–336, 1992.
- Joshua D Angrist, Guido W Imbens, and Donald B Rubin. Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91:444–455, 1996.
- Manuel Arellano and Costas Meghir. Female labour supply and on-the-job search: an empirical model estimated using complementary data sets. The Review of Economic Studies, 59:
- 537–559, 1992.
- Leo Breiman. Random forests. Machine Learning, 45:5–32, 2001.
- Bing Cai, Dylan S Small, and Thomas R Ten Have. Two-stage instrumental variable methods for estimating the causal odds ratio: analysis of bias. Statistics in Medicine, 30:1809–1824, 2011.
- Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters: Double/debiased machine learning. The Econometrics Journal, 21:C1 –
- C68, 2018.
- Leizhen Duan, Ling Han, Qin Liu, Yili Zhao, Lei Wang, and Yan Wang. Effects of vitamin d supplementation on general and central obesity: results from 20 randomized controlled trials involving apparently healthy populations. Annals of Nutrition and Metabolism, 76:
- 153–164, 2020.
- Kai-Tai Fang, Samuel Kotz, and Kai Wang Ng. Symmetric multivariate and related distributions. Chapman and Hall/CRC, 2018.
- Eric R Gamazon, Heather E Wheeler, Kaanan P Shah, Sahar V Mozaffari, Keston AquinoMichaels, Robert J Carroll, Anne E Eyler, Joshua C Denny, GTEx Consortium, Dan L
- Nicolae, et al. A gene-based association method for mapping traits using reference transcriptome data. Nature Genetics, 47:1091–1098, 2015.
- Theo Gasser and Hans-Georg Müller. Kernel estimation of regression functions. In Smoothing
- Techniques for Curve Estimation: Proceedings of a Workshop Held in Heidelberg, pages
- 23–68. Springer, 1979.
- Arthur S Goldberger. Structural equation methods in the social sciences. Econometrica: Journal of the Econometric Society, 40:979–1001, 1972.
- Zijian Guo and Dylan S Small. Control function instrumental variable estimation of nonlinear causal effect models. The Journal of Machine Learning Research, 17:3448–3482, 2016.
- Jinyong Hahn, Zhipeng Liao, and Geert Ridder. Nonparametric two-step sieve m estimation and inference. Econometric Theory, 34:1281–1324, 2018. Lars Peter Hansen.
- Large sample properties of generalized method of moments estimators. Econometrica, 50:1029–1054, 1982.
- James J Heckman. The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. In Annals of
- Economic and Social Measurement, volume 5, pages 475–492. NBER, 1976.
- Guido Imbens and Jeffrey Wooldridge. Control function and related methods. In What’s new in Econometrics. NBER SummerInstitute, 2007.
- Atsushi Inoue and Gary Solon. Two-sample instrumental variables estimators. The Review of
- Economics and Statistics, 92:557–561, 2010.
- Hyunseung Kang, Anru Zhang, T Tony Cai, and Dylan S Small. Instrumental variables estimation with some invalid instruments and its application to mendelian randomization.
- Journal of the American Statistical Association, 111:132–144, 2016.
- Samuel Kotz, Narayanaswamy Balakrishnan, and Norman L Johnson. Continuous multivariate distributions, Volume 1: Models and applications, volume 1. John Wiley & Sons, 2004.
- Sai Li and Zijian Guo. Causal inference for nonlinear outcome models with possibly invalid instrumental variables. arXiv preprint arXiv:2010.09922, 2020.
- Torben Martinussen, Ditte Nørbo Sørensen, and Stijn Vansteelandt. Instrumental variables estimation under a structural cox model. Biostatistics, 20:65–79, 2019.
- Wang Miao, Zhi Geng, and Eric J Tchetgen Tchetgen. Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika, 105:987–993, 2018.
- Elizabeth L Ogburn, Andrea Rotnitzky, and James M Robins. Doubly robust estimation of the local average treatment effect curve. Journal of the Royal Statistical Society: Series B
- (Statistical Methodology), 77:373–396, 2015.
- Joel Owen and Ramon Rabinovitch. On the class of elliptical distributions and their applications to the theory of portfolio choice. The Journal of Finance, 38:745–752, 1983.
- Amil Petrin. Revisiting instrumental variables and the classic control function approach, with implications for parametric and non-parametric regressions kyoo il kim. NBER Working Paper Series, 2011.
- Amil Petrin and Kenneth Train. A control function approach to endogeneity in consumer choice models. Journal of Marketing Research, 47:3–13, 2010.
- Douglas Rivers and Quang H Vuong. Limited information estimators and exogeneity tests for simultaneous probit models. Journal of Econometrics, 39:347–366, 1988.
- Donald B Rubin. Inference and missing data. Biometrika, 63:581–592, 1976.
- Donald B Rubin. Randomization analysis of experimental data: The Fisher randomization test comment. Journal of the American Statistical Association, 75:591–593, 1980.
- Kang Shuai, Shanshan Luo, Yue Zhang, Feng Xie, and Yangbo He. Identification and estimation of causal effects using non-gaussianity and auxiliary covariates.
- to appear in Statistica Sinica, 2024.
- Tea Skaaby, Lise Lotte Nystrup Husemoen, Torben Martinussen, Jacob P Thyssen, Michael
- Melgaard, Betina Heinsbæk Thuesen, Charlotta Pisinger, Torben Jørgensen, Jeanne D
- Johansen, Torkil Menné, et al. Vitamin d status, filaggrin genotype, and cardiovascular risk factors: a mendelian randomization approach. PloS one, 8:e57647, 2013.
- BaoLuo Sun and Wang Miao. On semiparametric instrumental variable estimation of average treatment effects through data fusion. Statistica Sinica, 32:569–590, 2022.
- Linbo Wang and Eric Tchetgen Tchetgen. Bounded, efficient and multiply robust estimation of average treatment effects using instrumental variables. Journal of the Royal Statistical
- Society: Series B (Statistical Methodology), 80:531–550, 2018.
- Jeffrey M. Wooldridge. Econometric Analysis of Cross Section and Panel Data. MIT Press, 2010.
- Jeffrey M Wooldridge. Control function methods in applied econometrics. Journal of Human Resources, 50:420–445, 2015.
- Qingyuan Zhao, Jingshu Wang, Wes Spiller, Jack Bowden, and Dylan S Small. Two-sample instrumental variable analyses using heterogeneous samples. Statistical Science, 34:317–
- 333, 2019.
- Qingyuan Zhao, Jingshu Wang, Gibran Hemani, Jack Bowden, and Dylan S Small. Statistical inference in two-sample summary-data mendelian randomization using robust adjusted profile score. Annals of Statistics, 48:1742–1769, 2020.
Acknowledgments
We sincerely thank the editor, associate editor, and reviewers for their insightful and helpful comments, which have significantly improved our pa-
per. Kang Shuai and Yangbo He are supported by the National Key R&D
Program of China (2022ZD0160300). Wei Li is supported by the Beijing
Natural Science Foundation (1232008), the National Natural Science Foundation of China (12101607, 12071015), the National Key R&D Program of
China (2022YFA1008100), and the MOE Project of Key Research Institute
of Humanities and Social Sciences (22JJD910001). Shanshan Luo is supported by the National Natural Science Foundation of China (12401378),
the Beijing Key Laboratory of Applied Statistics and Digital Regulation,
and the BTBU Digital Business Platform Project by BMEC.
Supplementary Materials
The supplementary material available online includes additional technical
proofs and simulation results.