Chapter 4 Estimation preliminaries: review of estimators for the average treatment effect

Recall our motivation for doing mediation analysis. We would like to decompose the total effect of a treatment \(A\) on an outcome \(Y\) into effects that operate through a mediator \(M\) vs effects that operate independently of \(M\).

Recall that we define the average treatment effect as \(E(Y_1-Y_0)\), and decompose it as follows

\[\begin{equation*} \E[Y_{1,M_1} - Y_{0,M_0}] = \underbrace{\E[Y_{\color{red}{1},\color{blue}{M_1}} - Y_{\color{red}{1},\color{blue}{M_0}}]}_{\text{natural indirect effect}} + \underbrace{\E[Y_{\color{blue}{1},\color{red}{M_0}} - Y_{\color{blue}{0},\color{red}{M_0}}]}_{\text{natural direct effect}} \end{equation*}\]

To introduce some of the ideas that we will use for estimation of the NDE, let us first briefly discuss estimation of \(\E(Y_1)\) (estimation of \(\E(Y_0)\) can be performed analogously).

First, notice that under the assumption of no unmeasured confounders (\(Y_1\indep A\mid W\)), we have

\[ \E(Y_1) = \E[ \E(Y \mid A=1, W) ]\]

4.1 Option 1: Plug-in estimator

The first estimator of \(\E[ \E(Y \mid A=1, W)]\) can be obtained in a three step procedure:

Fit a regression for \(Y\) on \(A\) and \(W\)
Use the above regression to predict the outcome mean if everyone’s \(A\) is set to \(A=1\)
Average these predictions

In formulas, this estimator can be written as \[\frac{1}{n} \sum_{i=1}^n \hat{\E}(Y \mid A_i=1, W_i)\]

Note that this is just a plug-in estimator for the above formula (called the g-formula): \(\E[\E(Y \mid A=1, W)]\)
This estimator requires that the regression model for \(\hat{\E}(Y \mid A_i=1, W_i)\) is correctly specified.
Downside: If we use machine learning for this model, we do not have general theory for computing standard errors and confidence intervals

4.2 Option 2: Inverse probability weighted estimator

An alternative method of estimation can be constructed after noticing that \[\E[\E(Y \mid A=1, W)] = \E \left[\frac{A}{\P(A=1\mid W)} Y \right],\] using the following procedure:

Fit a regression for \(A\) and \(W\)
Use the above regression to predict the probability of treatment \(A=1\)
Compute the inverse probability weights \(A_i / \hat{\P}(A_i =1 \mid W_i)\).
This weight will be zero for untreated units, and the inverse of the probability of treatment for treated units.
Compute the weighted average of the outcome:

\[\frac{1}{n} \sum_{i=1}^n \frac{A_i}{\hat{\P}(A_i=1 \mid W_i)} Y_i\]

This estimator requires that the regression model for \(\hat{\P}(A=1 \mid W_i)\) is correctly specified.
Downside: If we use machine learning for this model, we do not have general theory for computing standard errors and confidence intervals

4.3 Option 3: Augmented inverse probability weighted estimator

Not discussed here due to time constraints.