Abstract
Proximal causal inference provides a framework for estimating the av
erage treatment effect (ATE) in the presence of unmeasured confounding by
leveraging outcome and treatment proxies. Identification in this framework relies
on the existence of a so-called bridge function. Standard approaches typically
postulate a parametric specification for the bridge function, which is estimated
in a first step and then plugged into an ATE estimator. However, this sequential
procedure suffers from two potential sources of efficiency loss: (i) the difficulty
of efficiently estimating a bridge function defined by an integral equation, and
(ii) the failure to account for the correlation between the estimation steps. To
overcome these limitations, we propose a novel approach that approximates the
integral equation with increasing moment restrictions and jointly estimates the
bridge function and the ATE. We show that, under suitable conditions, our estimator is efficient. Additionally, we provide a data-driven procedure for selecting
the tuning parameter (i.e., the number of moment restrictions). Simulation studies reveal that the proposed method performs well in finite samples, and an
application to the right heart catheterization dataset from the SUPPORT study
demonstrates its practical value.
Information
| Preprint No. | SS-2025-0104 |
|---|---|
| Manuscript ID | SS-2025-0104 |
| Complete Authors | Chunrong Ai, Jiawei Shan |
| Corresponding Authors | Jiawei Shan |
| Emails | jiawei.shan@wisc.edu |
References
- Abadie, A. (2003, April). Semiparametric instrumental variable estimation of treatment response models. Journal of Econometrics 113(2), 231–263.
- Ai, C. and X. Chen (2003, November). Efficient estimation of models with conditional moment restrictions containing unknown functions. Econometrica 71(6), 1795–1843.
- Ai, C. and X. Chen (2012, October). The semiparametric efficiency bound for models of sequential moment restrictions containing unknown functions. Journal of Econometrics 170(2), 442–457.
- Andrews, D. W. K. (2017, August). Examples of L2-complete and boundedly-complete distributions. Journal of Econometrics 199(2), 213–220.
- Bang, H. and J. M. Robins (2005, December). Doubly robust estimation in missing data and causal inference models. Biometrics 61(4), 962–973.
- Bhattacharya, R., R. Nabi, and I. Shpitser (2022, January). Semiparametric inference for causal effects in graphical models with hidden variables. J. Mach. Learn. Res. 23(1), 295:13325–
- 295:13400.
- Brown, B. W. and W. K. Newey (1998). Efficient semiparametric estimation of expectations. Econometrica 66(2), 453–464.
- Chen, X. (2007, January). Large sample sieve estimation of semi-nonparametric models. In J. J.
- Heckman and E. E. Leamer (Eds.), Handbook of Econometrics, Volume 6, pp. 5549–5632. Elsevier.
- Chen, X., V. Chernozhukov, S. Lee, and W. K. Newey (2014). Local identification of nonparametric and semiparametric models. Econometrica 82(2), 785–809.
- Connors, Jr, A. F., T. Speroff, N. V. Dawson, C. Thomas, F. E. Harrell, Jr, D. Wagner, N. Desbiens, L. Goldman, A. W. Wu, R. M. Califf, W. J. Fulkerson, Jr, H. Vidaillet, S. Broste, P. Bellamy, J. Lynn, and W. A. Knaus (1996, September). The effectiveness of right heart catheterization in the initial care of critically III patients. JAMA 276(11), 889–897.
- Cui, Y., H. Pu, X. Shi, W. Miao, and E. Tchetgen Tchetgen (2023). Semiparametric proximal causal inference. Journal of the American Statistical Association 0(0), 1–12. D’Haultfoeuille, X. (2011, June). On the completeness condition in nonparametric instrumental problems. Econometric Theory 27(3), 460–471.
- Donald, S. G., G. W. Imbens, and W. K. Newey (2009, September). Choosing instrumental variables in conditional moment restriction models. Journal of Econometrics 152(1), 28– 36.
- Dukes, O., I. Shpitser, and E. J. Tchetgen Tchetgen (2023, March). Proximal mediation analysis. Biometrika, asad015.
- Egami, N. and E. J. Tchetgen Tchetgen (2024, April). Identification and estimation of causal peer effects using double negative controls for unmeasured network confounding. Journal of the Royal Statistical Society Series B: Statistical Methodology 86(2), 487–511.
- Ghassami, A., A. Yang, I. Shpitser, and E. Tchetgen Tchetgen (2024, July). Causal inference with hidden mediators. Biometrika, asae037.
- Greenland, S. and J. M. Robins (1986). Identifiability, exchangeability, and epidemiological confounding. International Journal of Epidemiology 15(3), 413–419.
- Guo, A., D. Benkeser, and R. Nabi (2023, December). Targeted machine learning for average causal effect estimation using the front-door functional. arXiv preprint arXiv:2312.10234.
- Guo, A. and R. Nabi (2024, September).
- Average causal effect estimation in DAGs with hidden variables:
- Extensions of back-door and front-door criteria.
- arXiv preprint arXiv:2409.03962.
- Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica 50(4), 1029–1054.
- Hirano, K. and G. W. Imbens (2001, December). Estimation of causal effects using propensity score weighting: An application to data on right heart catheterization. Health Services and
- Outcomes Research Methodology 2(3), 259–278.
- Hu, Y. and J.-L. Shiu (2018, June). Nonparametric identification using instrumental variables:
- Sufficient conditions for completeness. Econometric Theory 34(3), 659–693.
- Kallus, N., X. Mao, and M. Uehara (2022, October). Causal inference under unmeasured confounding with negative controls:
- A minimax learning approach.
- arXiv preprint arXiv:2103.14029.
- Kline, B. and E. Tamer (2023, September).
- Recent developments in partial identification.
- Annual Review of Economics 15(Volume 15, 2023), 125–150.
- Kompa, B., D. R. Bellamy, T. Kolokotrones, J. Robins, and A. Beam (2022, October). Deep learning methods for proximal inference via maximum moment restriction. In Advances in Neural Information Processing Systems.
- Kress, R. (1989). Linear Integral Equations, Volume 82 of Applied Mathematical Sciences. New York: Springer New York.
- Mastouri, A., Y. Zhu, L. Gultchin, A. Korba, R. Silva, M. Kusner, A. Gretton, and K. Muandet
- (2021, July). Proximal causal learning with kernels: Two-stage estimation and moment restriction. In Proceedings of the 38th International Conference on Machine Learning, pp.
- 7512–7523. PMLR.
- Miao, W., Z. Geng, and E. Tchetgen Tchetgen (2018, December). Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika 105(4), 987–993.
- Miao, W., X. Shi, Y. Li, and E. J. Tchetgen Tchetgen (2024, October). A confounding bridge approach for double negative control inference on causal effects. Statistical Theory and Related Fields 8(4), 262–273.
- Newey, W. K. (1997, July). Convergence rates and asymptotic normality for series estimators. Journal of Econometrics 79(1), 147–168.
- Newey, W. K. and D. McFadden (1994, January). Large sample estimation and hypothesis testing. In Handbook of Econometrics, Volume 4, pp. 2111–2245. Elsevier.
- Newey, W. K. and J. L. Powell (2003, September). Instrumental variable estimation of nonparametric models. Econometrica 71(5), 1565–1578.
- Newey, W. K. and R. J. Smith (2004). Higher Order Properties of GMM and Generalized Empirical Likelihood Estimators. Econometrica 72(1), 219–255.
- Pearl, J. (1995). On the testability of causal models with latent and instrumental variables.
- Uncertainty in Artificial Intelligence. Proceedings of the Eleventh Conference (1995), 435– 43.
- Qi, Z., R. Miao, and X. Zhang (2024, April). Proximal learning for individualized treatment regimes under unmeasured confounding.
- Journal of the American Statistical Association 119(546), 915–928.
- Qiu, H., X. Shi, W. Miao, E. Dobriban, and E. Tchetgen Tchetgen (2024, June). Doubly robust proximal synthetic controls. Biometrics 80(2), ujae055.
- Richardson, T. S., R. J. Evans, J. M. Robins, and I. Shpitser (2023, February). Nested Markov properties for acyclic directed mixed graphs. The Annals of Statistics 51(1), 334–361.
- Scharfstein, D. O., A. Rotnitzky, and J. M. Robins (1999, December).
- Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of the American
- Statistical Association 94(448), 1096–1120.
- Shi, X., W. Miao, J. C. Nelson, and E. J. Tchetgen Tchetgen (2020, April). Multiply robust causal inference with double-negative control adjustment for categorical unmeasured confounding. Journal of the Royal Statistical Society Series B: Statistical Methodology 82(2), 521–540.
- Tan, Z. (2006). Regression and weighting methods for causal inference using instrumental variables. Journal of the American Statistical Association 101(476), 1607–1618.
- Tchetgen Tchetgen, E. J., A. Ying, Y. Cui, X. Shi, and W. Miao (2024). An introduction to proximal causal inference. Statistical Science in press.
- Vermeulen, K. and S. Vansteelandt (2015, September). Bias-reduced doubly robust estimation.
- Journal of the American Statistical Association 110(511), 1024–1036.
- White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica 50(1), 1–25.
- Ying, A. (2024, May). Proximal survival analysis to handle dependent right censoring. Journal of the Royal Statistical Society Series B: Statistical Methodology, qkae037.
- Ying, A., W. Miao, X. Shi, and E. J. Tchetgen Tchetgen (2023, July). Proximal causal inference for complex longitudinal studies. Journal of the Royal Statistical Society Series B: Statistical Methodology 85(3), 684–704. Chunrong Ai
Acknowledgments
Chunrong Ai gratefully acknowledges funding from Project 72133005, supported by the NSFC.
A.
Regularity conditions
Smoothness classes of functions: Let p be the largest integer satisfying
p < p, and a = (a1, . . . , ad). A function ϕ(v) with domain V ⊂Rd is called
a p-smooth function if it is p times continuously differentiable on V and
∂pϕ(v)
∂vα1
1 · · · ∂vαd
d
−
∂pϕ(v′)
∂vα1
1 · · · ∂vαd
d
≤C ∥v −v′∥p−p ,
max
Pd
i=1 ai=p
for all v, v′ ∈V and some constant C > 0.
Regularity conditions: The following regularity conditions (A1)–(A4)
are common in the GMM literature (Newey and McFadden, 1994, p. 2132).
Condition (A5) imposes mild smoothness restrictions to ensure a good sieve
approximation.
(A1) Γ is a compact subset in Rp; γ0 lies in the interior of Γ and is the
unique solution to (3.4); T is a compact subset in R, and τ0 lies in
the interior of T ;
(A2) h(W, A, X; γ) is twice continuously differentiable in γ ∈Γ;
(A3) E[supγ∈Γ{Y −h(W, A, X; γ)}2] < ∞and E[supγ∈Γ ∥∇γh(W, A, X; γ)∥2] <
∞;
(A4) E[supγ∈Γ {h(W, 1, X; γ) −h(W, 0, X; γ)}2] < ∞and E[supγ∈Γ ∥∇γ{h(W, 1, X; γ)−
h(W, 0, X; γ)}∥2] < ∞.
(A5) E[∇γh(W, A, X; γ0) | z, a, x], E [{Y −h(W, A, X)}2 | z, a, x] and
E[{Y −h(W, A, X)}{h(W, 1, X) −h(W, 0, X) −τ0} | z, a, x] are psmooth functions for some p > 0.
B.
Notation
The following notations, adapted slightly from Donald et al. (2009), are
used in Section 5 for selecting K:
N
X
bΥK×K = 1
i=1
{Yi −h(Wi, Ai, Xi; ˇγ)}2uK(Zi, Ai, Xi)uK(Zi, Ai, Xi)
T,
N
N
X
b
BK×p = −1
i=1
uK(Zi, Ai, Xi)∇γh(Wi, Ai, Xi; ˇγ)
T,
bΩp×p = ( b
BK×p)
T bΥ−1
K×K b
BK×p,
N
edi = ( b
BK×p)
T
1
N
X
j=1
uK(Zj, Aj, Xj)uK(Zj, Aj, Xj)
T
−1
uK(Zi, Ai, Xi),
N
eηi = −∇γh(Wi, Ai, Xi; ˇγ) −edi,
D∗
i = ( b
BK×p)
T bΥ−1
K×KuK(Zi, Ai, Xi),
bξij = uK(Zi, Ai, Xi)
T bΥ−1
K×KuK(Zj, Aj, Xj)/N.
For a fixed t ∈Rp,
N
X
i=1
bξii{Yi −h(Wi, Ai, Xi; ˇγ)}t
T bΩ−1
p×peηi,
bΠ(K; t) =
N
X
i=1
bξii
n
t
T bΩ−1
p×p
h
b
D∗
i {Yi −h(Wi, Ai, Xi; ˇγ)}2 + ∇γh(Zi, Ai, Xi; ˇγ)
io2
−t
T bΩ−1
p×pt.
bΦ(K; t) =
The loss function is
p
X
j=1
bΠ(K; ej)2/N + bΦ(K; ej),
SGMM(K) =
where ej is the unit vector with 1 in the j-th component and 0 in all others.
In addition, Tables A1–A3 summarize the random variables, notation,
and assumptions used in the paper for ease of reference.
Table A1: List of random variables.
A
Binary treatment assignment.
Y (a)
Potential outcomes under treatment A = a.
Y
Observed outcome.
X
Observed confounders.
U
Unobserved confounders.
(Z, W)
Treatment/outcome-inducing confounding proxies.
O
Observed variables O = (A, Y, W, X, Z).
Table A2: List of notation.
h(w, a, x)
Outcome-confounding bridge function; see Eq. (2.1).
q(z, a, x)
Treatment-confounding bridge function; see Eq. (4.8).
uK(z, a, x)
Vector of basis functions; see Eq. (3.4).
meff (z, a, x)
Efficient score for estimating h; see Eq. (2.3).
gK(O; γ, τ)
Joint score function for estimating h and τ0; see Section 3.
GK(γ, τ)
Sample average of gK(O; γ, τ); see Section 3.
Optimal weight in GMM estimation; see Section 3.
B(K+1)×(p+1)
Jacobian matrix; see Eq. (3.7).
{ψ1(O), ψ2(O)}
Influence functions of (bγ, bτ); see Theorem 1.
ψeff (O)
Efficient influence function of τ0; see Eq. (4.9).
{t(·), R(·), κ}
Terms in the influence function; see Theorem 1.
VK
Asymptotic variance of the GMM estimator; see Eq. (3.7).
(Vγ, Vτ)
Asymptotic variance of (bγ, bτ); see Theorem 1.
Vτ,eff
Semiparametric efficiency bound for τ0; see Eq. (4.9).
V plug-in
τ,eff
Variance of the optimal plug-in estimator; see Theorem 3.
Table A3: Summary of assumptions.
Assumption 1
Conditions for identifying treatment effects using proxies.
Assumption 2
Conditions for the use of sieve techniques.
Assumption 3
Conditions for developing semiparametric theory.
Conditions (A1)-(A5)
Conditions for the asymptotic analysis of the GMM estimator.
Supplementary Materials
The supplementary material includes intermediate lemmas, additional simulation studies, and all technical proofs.