Chapter 4 Estimation preliminaries: review of estimators for the average treatment effect
Recall our motivation for doing mediation analysis. We would like to decompose the total effect of a treatment \(A\) on an outcome \(Y\) into effects that operate through a mediator \(M\) vs effects that operate independently of \(M\).
Recall that we define the average treatment effect as \(E(Y_1-Y_0)\), and decompose it as follows
\[\begin{equation*} \E[Y_{1,M_1} - Y_{0,M_0}] = \underbrace{\E[Y_{\color{red}{1},\color{blue}{M_1}} - Y_{\color{red}{1},\color{blue}{M_0}}]}_{\text{natural indirect effect}} + \underbrace{\E[Y_{\color{blue}{1},\color{red}{M_0}} - Y_{\color{blue}{0},\color{red}{M_0}}]}_{\text{natural direct effect}} \end{equation*}\]
To introduce some of the ideas that we will use for estimation of the NDE, let us first briefly discuss estimation of \(\E(Y_1)\) (estimation of \(\E(Y_0)\) can be performed analogously).
First, notice that under the assumption of no unmeasured confounders (\(Y_1\indep A\mid W\)), we have
\[ \E(Y_1) = \E[ \E(Y \mid A=1, W) ]\]
4.1 Option 1: Plug-in estimator
The first estimator of \(\E[ \E(Y \mid A=1, W)]\) can be obtained in a three step procedure:
- Fit a regression for \(Y\) on \(A\) and \(W\)
- Use the above regression to predict the outcome mean if everyone’s \(A\) is set to \(A=1\)
- Average these predictions
In formulas, this estimator can be written as \[\frac{1}{n} \sum_{i=1}^n \hat{\E}(Y \mid A_i=1, W_i)\]
- Note that this is just a plug-in estimator for the above formula (called the g-formula): \(\E[\E(Y \mid A=1, W)]\)
- This estimator requires that the regression model for \(\hat{\E}(Y \mid A_i=1, W_i)\) is correctly specified.
- Downside: If we use machine learning for this model, we do not have general theory for computing standard errors and confidence intervals
4.2 Option 2: Inverse probability weighted estimator
An alternative method of estimation can be constructed after noticing that \[\E[\E(Y \mid A=1, W)] = \E \left[\frac{A}{\P(A=1\mid W)} Y \right],\] using the following procedure:
- Fit a regression for \(A\) and \(W\)
- Use the above regression to predict the probability of treatment \(A=1\)
- Compute the inverse probability weights \(A_i / \hat{\P}(A_i =1 \mid W_i)\).
- This weight will be zero for untreated units, and the inverse of the probability of treatment for treated units.
- Compute the weighted average of the outcome:
\[\frac{1}{n} \sum_{i=1}^n \frac{A_i}{\hat{\P}(A_i=1 \mid W_i)} Y_i\]
- This estimator requires that the regression model for \(\hat{\P}(A=1 \mid W_i)\) is correctly specified.
- Downside: If we use machine learning for this model, we do not have general theory for computing standard errors and confidence intervals