2 Types of path-specific causal mediation effects

Controlled direct effects
Natural direct and indirect effects
Randomized interventional direct and indirect effects
Path-specific effects using recanting twins

Code

\dimendef\prevdepth=0
\pgfdeclarelayer{background}
\pgfsetlayers{background,main}
\usetikzlibrary{arrows,positioning}
\tikzset{
>=stealth',
punkt/.style={
rectangle,
rounded corners,
draw=black, very thick,
text width=6.5em,
minimum height=2em,
text centered},
pil/.style={
->,
thick,
shorten <=2pt,
shorten >=2pt,}
}
\newcommand{\Vertex}[2]
{\node[minimum width=0.6cm,inner sep=0.05cm] (#2) at (#1) {$#2$};
}
\newcommand{\VertexR}[2]
{\node[rectangle, draw, minimum width=0.6cm,inner sep=0.05cm] (#2) at (#1) {$#2$};
}
\newcommand{\ArrowR}[3]
{ \begin{pgfonlayer}{background}
\draw[->,#3] (#1) to[bend right=30] (#2);
\end{pgfonlayer}
}
\newcommand{\ArrowL}[3]
{ \begin{pgfonlayer}{background}
\draw[->,#3] (#1) to[bend left=45] (#2);
\end{pgfonlayer}
}
\newcommand{\EdgeL}[3]
{ \begin{pgfonlayer}{background}
\draw[dashed,#3] (#1) to[bend right=-45] (#2);
\end{pgfonlayer}
}
\newcommand{\Arrow}[3]
{ \begin{pgfonlayer}{background}
\draw[->,#3] (#1) -- +(#2);
\end{pgfonlayer}
}
\begin{tikzpicture}
  \Vertex{-4, 0}{W}
  \Vertex{0, 0}{M}
  \Vertex{-2, 0}{A}
  \Vertex{2, 0}{Y}
  \Arrow{W}{A}{black}
  \Arrow{A}{M}{black}
  \Arrow{M}{Y}{black}
  \ArrowL{W}{Y}{black}
  \ArrowL{A}{Y}{black}
  \ArrowL{W}{M}{black}
\end{tikzpicture}

Directed acyclic graph under *no intermediate confounders* of the mediator-outcome relation affected by treatment

2.1 Controlled direct effects

\[\psi_{\text{CDE}} = \E(Y_{1,m} - Y_{0,m})\]

Intervening on the SCM to set $M=m$ uniformly for everyone in the population
Compare $A=1$ vs $A=0$ with $M=m$ fixed

2.1.1 Identification assumptions:

Confounder assumptions:
- $A \indep Y_{a,m} \mid W$
- $M \indep Y_{a,m} \mid W, A$
Positivity assumptions:
- $\P(M = m \mid A=a, W) > 0 \text{ } a.e.$
- $\P(A=a \mid W) > 0 \text{ } a.e.$

Under the above identification assumptions, the controlled direct effect can be identified: \[ \E(Y_{1,m} - Y_{0,m}) = \E\{ \color{ForestGreen}{\E(Y \mid A=1, M=m, W) - \E(Y \mid A=0, M=m, W)} \} \]

For intuition about this formula in R, let’s continue with a toy example:
First we fit a correct model for the outcome
Assume we would like the CDE at $m=0$
Then we generate predictions \[ \color{ForestGreen}{\E(Y \mid A=1, M=m, W)} \text{ and } \color{ForestGreen}{\E(Y \mid A=0, M=m, W)} \ : \]
Then we compute the difference between the predicted values $\color{ForestGreen}{\E(Y \mid A=1, M=m, W) - \E(Y \mid A=0, M=m, W)}$, and average across values of $W$

2.1.2 Is this the estimand I want?

Makes the most sense if it is reasonable for everyone have a constant level $m \in \mathcal{M}$.
- Judea Pearl calls this prescriptive.
- Can you think of an example? (Air pollution, rescue inhaler dosage, hospital visits…)
- Does not provide a decomposition of the average treatment effect into direct and indirect effects.

What if everyone having the same value of the mediator doesn’t make sense?

What if we want to decompose the average treatment effect into its direct and indirect counterparts?

2.2 Natural direct and indirect effects

Continuing to use the same DAG as above,

Recall the definition of the nested counterfactual: \[\begin{equation*} Y_{1, M_0} = f_Y(W, 1, M_0, U_Y) \end{equation*}\]
Interpreted as the outcome for an individual in a hypothetical world where treatment was given but the mediator was held at the value it would have taken under no treatment
Recall that, because of the definition of counterfactuals \[\begin{equation*} Y_{1, M_1} = Y_1 \end{equation*}\]

Then we can decompose the average treatment effect $\E(Y_1-Y_0)$ as follows

\[\begin{equation*} \E[Y_{1,M_1} - Y_{0,M_0}] = \underbrace{\E[Y_{\color{red}{1},\color{blue}{M_1}} - Y_{\color{red}{1},\color{blue}{M_0}}]}_{\text{natural indirect effect}} + \underbrace{\E[Y_{\color{blue}{1},\color{red}{M_0}} - Y_{\color{blue}{0},\color{red}{M_0}}]}_{\text{natural direct effect}} \end{equation*}\]

Natural direct effect (NDE): Varying treatment while keeping the mediator fixed at the value it would have taken under no treatment
Natural indirect effect (NIE): Varying the mediator from the value it would have taken under treatment to the value it would have taken under control, while keeping treatment fixed

2.2.1 Identification assumptions:

$A \indep Y_{a,m} \mid W$
$M \indep Y_{a,m} \mid W, A$
$A \indep M_a \mid W$
$M_0 \indep Y_{1,m} \mid W$
and positivity assumptions

2.2.2 Cross-world independence assumption

What does $M_0 \indep Y_{1,m} \mid W$ mean?

Conditional on $W$, knowledge of the mediator value in the absence of treatment, $M_0$, provides no information about the outcome under treatment, $Y_{1,m}$.
Can you think of a data-generating mechanism that would violate this assumption?
Example: in a randomized study, whenever we believe that treatment assignment works through adherence (i.e., almost always), we are violating this assumption (more on this later).
Cross-world assumptions are problematic for other reasons, including:
- You can never design a randomized study where the assumption holds by design.

If the cross-world assumption holds, can write the NDE as a weighted average of controlled direct effects at each level of $M=m$.

\[ \E \sum_m \{\E(Y_{1,m} \mid W) - \E(Y_{0,m} \mid W)\} \P(M_{0}=m \mid W) \]

If CDE($m$) is constant across $m$, then CDE = NDE.

2.2.3 Identification formula:

Under the above identification assumptions, the natural direct effect can be identified: \[\begin{equation*} \E(Y_{1,M_0} - Y_{0,M_0}) = \E[\color{red}{\E\{}\color{ForestGreen}{\E(Y \mid A=1, M, W) - \E(Y \mid A=0, M, W)}\color{red}{\mid A=0,W\}}] \end{equation*}\]
The natural indirect effect can be identified similarly.
Let’s dissect this formula in R:
First we fit a correct model for the outcome
Then we generate predictions $\color{ForestGreen}{\E(Y \mid A=1, M, W)} \text{ and }\color{ForestGreen}{\E(Y \mid A=0, M, W)}$ with $A$ fixed but letting $M$ and $W$ take their observed values
Then we compute the difference between the predicted values $\color{ForestGreen}{\E(Y \mid A=1, M, W) - \E(Y \mid A=0, M, W)},$
and use this difference as a pseudo-outcome in a regression on $A$ and $W$: $\color{red}{\E\{}\color{ForestGreen}{\E(Y \mid A=1, M, W) - \E(Y \mid A=0, M, W)}\color{red}{\mid A=0,W\}}$
Now we predict the value of this pseudo-outcome under $A=0$, and average the result

2.2.4 Is this the estimand I want?

Makes sense to hypothetically set $A$ but not $M$.
Want to understand a natural mechanism underlying an association or total effect. Judea Pearl calls this descriptive.
NDE + NIE = total effect (ATE).
Okay with the assumptions.

What if our data structure involves a post-treatment confounder of the mediator-outcome relationship (e.g., adherence)?

Code

\dimendef\prevdepth=0
\pgfdeclarelayer{background}
\pgfsetlayers{background,main}
\usetikzlibrary{arrows,positioning}
\tikzset{
>=stealth',
punkt/.style={
rectangle,
rounded corners,
draw=black, very thick,
text width=6.5em,
minimum height=2em,
text centered},
pil/.style={
->,
thick,
shorten <=2pt,
shorten >=2pt,}
}
\newcommand{\Vertex}[2]
{\node[minimum width=0.6cm,inner sep=0.05cm] (#2) at (#1) {$#2$};
}
\newcommand{\VertexR}[2]
{\node[rectangle, draw, minimum width=0.6cm,inner sep=0.05cm] (#2) at (#1) {$#2$};
}
\newcommand{\ArrowR}[3]
{ \begin{pgfonlayer}{background}
\draw[->,#3] (#1) to[bend right=30] (#2);
\end{pgfonlayer}
}
\newcommand{\ArrowL}[3]
{ \begin{pgfonlayer}{background}
\draw[->,#3] (#1) to[bend left=45] (#2);
\end{pgfonlayer}
}
\newcommand{\EdgeL}[3]
{ \begin{pgfonlayer}{background}
\draw[dashed,#3] (#1) to[bend right=-45] (#2);
\end{pgfonlayer}
}
\newcommand{\Arrow}[3]
{ \begin{pgfonlayer}{background}
\draw[->,#3] (#1) -- +(#2);
\end{pgfonlayer}
}
\begin{tikzpicture}
  \Vertex{0, -1}{Z}
  \Vertex{-4, 0}{W}
  \Vertex{0, 0}{M}
  \Vertex{-2, 0}{A}
  \Vertex{2, 0}{Y}
  \ArrowR{W}{Z}{black}
  \Arrow{Z}{M}{black}
  \Arrow{W}{A}{black}
  \Arrow{A}{M}{black}
  \Arrow{M}{Y}{black}
  \Arrow{A}{Z}{black}
  \Arrow{Z}{Y}{black}
  \ArrowL{W}{Y}{black}
  \ArrowL{A}{Y}{black}
  \ArrowL{W}{M}{black}
\end{tikzpicture}

Directed acyclic graph under intermediate confounders of the mediator-outcome relation affected by treatment

2.2.5 Unidentifiability of the NDE and NIE in this setting

In this example, natural direct and indirect effects are not generally point identified from observed data $O=(W,A,Z,M,Y)$.
The reason for this is that the cross-world counterfactual assumption \[Y_{1,m} \indep M_0 \mid W\] does not hold in the above directed acyclic graph.
To give intuition, we focus on the counterfactual outcome $Y_{A=1, M_{A=0}}$.
- This counterfactual outcome involves two counterfactual worlds simultaneously: one in which $A=1$ for the first portion of the counterfactual outcome, and one in which $A=0$ for the nested portion of the counterfactual outcome.
  - Setting $A=1$ induces a counterfactual treatment-induced confounder, denoted $Z_{A=1}$. Setting $A=0$ induces another counterfactual treatment-induced confounder, denoted $Z_{A=0}$.
  - The two treatment-induced counterfactual confounders, $Z_{A=1}$ and $Z_{A=0}$ share unmeasured common causes, $U_Z$, which creates a spurious association.
  - Because $Z_{A=1}$ is causally related to $Y_{A=1, M=m}$, and $Z_{A=0}$ is also casually related to $M_{A=0}$, the path through $U_Z$ means that the backdoor criterion is not met for identification of $Y_{A=1, M_{A=0}}$, i.e., $M_{0} \notindep Y_{A=1, m} \mid W$, where $W$ denotes baseline covariates.

Code

\dimendef\prevdepth=0
\pgfdeclarelayer{background}
\pgfsetlayers{background,main}
\usetikzlibrary{arrows,positioning}
\tikzset{
>=stealth',
punkt/.style={
rectangle,
rounded corners,
draw=black, very thick,
text width=6.5em,
minimum height=2em,
text centered},
pil/.style={
->,
thick,
shorten <=2pt,
shorten >=2pt,}
}
\newcommand{\Vertex}[2]
{\node[minimum width=0.6cm,inner sep=0.05cm] (#2) at (#1) {$#2$};
}
\newcommand{\VertexR}[2]
{\node[rectangle, draw, minimum width=0.6cm,inner sep=0.05cm] (#2) at (#1) {$#2$};
}
\newcommand{\ArrowR}[3]
{ \begin{pgfonlayer}{background}
\draw[->,#3] (#1) to[bend right=30] (#2);
\end{pgfonlayer}
}
\newcommand{\ArrowL}[3]
{ \begin{pgfonlayer}{background}
\draw[->,#3] (#1) to[bend left=45] (#2);
\end{pgfonlayer}
}
\newcommand{\EdgeL}[3]
{ \begin{pgfonlayer}{background}
\draw[dashed,#3] (#1) to[bend left=45] (#2);
\end{pgfonlayer}
}
\newcommand{\EdgeR}[3]
{ \begin{pgfonlayer}{background}
\draw[dashed,#3] (#1) to[bend right=-45] (#2);
\end{pgfonlayer}
}
\newcommand{\Arrow}[3]
{ \begin{pgfonlayer}{background}
\draw[->,#3] (#1) -- +(#2);
\end{pgfonlayer}
}

\begin{tikzpicture}
  \Vertex{0, 1}{Z_0}
  \Vertex{0, -1}{Z_1}
  \Vertex{-2, 0}{U_Z}
  \Vertex{2, 1}{M_0}
  \Vertex{4, -1}{Y_{1, m}}
  \Arrow{Z_0}{M_0}{black}
  \Arrow{Z_1}{Y_{1, m}}{black}
  \Arrow{U_Z}{Z_0}{black}
  \Arrow{U_Z}{Z_1}{black}
\end{tikzpicture}

Parallel worlds model of the scenario considered

However:

We can actually identify the NIE/NDE if we are willing to invoke monotonicity between a treatment and one or more binary treatment-induced confounders (Tchetgen Tchetgen and VanderWeele 2014).
Assuming monotonicity is also sometimes referred to as assuming “no defiers”—in other words, assuming that there are no individuals who would do the opposite of the encouragement.
Monotonicity may seem like a restrictive assumption, but may be reasonable in some common scenarios (e.g., in trials where the intervention is randomized treatment assignment and the treatment-induced confounder is whether or not treatment was actually taken—in this setting, we may feel comfortable assuming that there are no “defiers”, frequently assumed when using IVs to identify causal effects)

Tchetgen Tchetgen, Eric J, and Tyler J VanderWeele. 2014. “On Identification of Natural Direct Effects When a Confounder of the Mediator Is Directly Affected by Exposure.” Epidemiology 25 (2): 282.

Note: CDEs are still identified in this setting. They can be identified and estimated similarly to a longitudinal data structure with a two-time-point intervention.

2.3 Randomized interventional (in)direct effects

Let $G_a$ denote a random draw from the distribution of $M_a \mid W$
Define the counterfactual $Y_{1,G_0}$ as the counterfactual variable in a hypothetical world where $A$ is set $A=1$ and $M$ is set to $M=G_0$ with probability one.

Define $Y_{0,G_0}$ and $Y_{1,G_1}$ similarly
Then we can define: \[ \E[Y_{1,G_1} - Y_{0,G_0}] = \underbrace{\E[Y_{\color{red}{1}, \color{blue}{G_1}} - Y_{\color{red}{1}, \color{blue}{G_0}}]}_{\text{interventional indirect effect}} + \underbrace{\E[Y_{\color{blue}{1},\color{red}{G_0}} - Y_{\color{blue}{0}, \color{red}{G_0}}]}_{\text{interventional direct effect}} \]
Note that $\E[Y_{1,G_1} - Y_{0,G_0}]$ is still an overall effect of treatment, even if it is different from the ATE $\E[Y_{1} - Y_{0}]$
We gain in the ability to solve a problem, but lose in terms of interpretation of the causal effect (cannot decompose the ATE)

2.3.1 An alternative definition of the effects:

Above we defined $G_a$ as a random draw from the distribution of $M_a \mid W$
What if instead we define $G_a$ as a random draw from the distribution of $M_a \mid (Z_a,W)$
It turns out the indirect effect defined in this way only measures the path $A \rightarrow M \rightarrow Y$, and not the path $A \rightarrow Z \rightarrow M \rightarrow Y$
There may be important reasons to choose one over another (e.g., survival analyses where we want the distribution conditional on $Z$, instrumental variable designs where it doesn’t make sense to condition on $Z$)

2.3.2 Identification assumptions:

$A \indep Y_{a,m} \mid W$
$M \indep Y_{a,m} \mid W, A, Z$
$A \indep M_a \mid W$
and positivity assumptions.

Under these assumptions, the randomized interventional direct and indirect effect is identified: \[\begin{align*} \E &(Y_{a, G_{a'}}) = \\ & \E\left[\color{Purple}{\E\left\{\color{red}{\sum_z} \color{ForestGreen}{\E(Y \mid A=a, Z=z, M, W)} \color{red}{\P(Z=z \mid A=a, W)}\mid A=a', W\right\}} \right] \end{align*}\]

Let’s dissect this formula in R:
Let us compute $\E(Y_{1, G_0})$ (so that $a = 1$, and $a'=0$).
First, fit a regression model for the outcome, and compute $\color{ForestGreen}{\E(Y \mid A=a, Z=z, M, W)}$ for all values of $z$
Now we fit the true model for $Z \mid A, W$ and get the conditional probability that $Z=1$ fixing $A=1$
Now we compute the following pseudo-outcome: $\color{red}{\sum_z}\color{ForestGreen}{\E(Y \mid A=a, Z=z, M, W)} \color{red}{\P(Z=z \mid A=a, w)}$
Now we regress this pseudo-outcome on $A,W$, and compute the predictions setting $A=0$, that is, \[ \color{Purple}{\E\left\{\color{red}{\sum_z} \color{ForestGreen}{\E(Y \mid A=a, Z=z, M, W)} \color{red}{\P(Z=z \mid A=a, w)}\mid A=a', W\right\}} \]
And finally, just average those predictions!
This was for $(a,a')=(1,0)$. Can do the same with $(a,a')=(1,1)$, and $(a,a')=(0,0)$ to obtain an effect decomposition

\[\begin{equation*} \E[Y_{1,G_1} - Y_{0,G_0}] = \underbrace{\E[Y_{\color{red}{1}, \color{blue}{G_1}} - Y_{\color{red}{1}, \color{blue}{G_0}}]}_{\text{interventional indirect effect}} + \underbrace{\E[Y_{\color{blue}{1},\color{red}{G_0}} - Y_{\color{blue}{0}, \color{red}{G_0}}]}_{\text{interventional direct effect}} \end{equation*}\]

2.3.3 Is this the estimand I want?

Do not want to intervene directly on $M$ in the causal model.
Goal is to understand a descriptive type of mediation.
Okay with the assumptions!

2.3.4 But, there is an important limitation of randomized interventional effects

Miles (2022) recently uncovered an important limitation of these effects, which can be described as follows. The sharp mediational null hypothesis can be defined as

Miles, Caleb H. 2022. “On the Causal Interpretation of Randomized Interventional Indirect Effects.” arXiv Preprint arXiv:2203.00245. https://arxiv.org/abs/2203.00245.

\[ H_0:Y(a, M(a')) = Y(a, M(a^\star));\text{ for all }a, a', a^\star \ . \]

The problem is that randomized interventional effects are not guaranteed to be null when the sharp mediational hypothesis is true.

This could present a problem in practice if some subgroup of the population has a relationship between $A$ and $M$, but not between $M$ and $Y$. Then, another distinct subgroup of the population has a relationship between $M$ and $Y$ but not between $A$ and $M$. In such a scenario, the randomized interventional indirect effect would be nonzero, but there would be no one person in the population whose effect of $A$ on $Y$ would be mediated by $M$.

More details can be found in the original paper.

2.4 Path-specific effects using recanting twins

The DAG we are working under contains four paths of interest:

\[\begin{align*} P_1&: A\rightarrow Y;\\ P_2&: A \rightarrow Z \rightarrow Y;\\ P_3&: A \rightarrow Z \rightarrow M \rightarrow Y\\ P_4&: A \rightarrow M \rightarrow Y, \end{align*}\]

Natural path-specific effects could be defined using the following nested counterfactuals: \[\begin{align*} Y_{S_0}&=Y(1, Z(1), M(1, Z(1))),\\ Y_{S_1}&=Y(0, Z(1), M(1, Z(1))),\\ Y_{S_2}&=Y(0, Z(0), M(1, Z(1))),\\ Y_{S_3}&=Y(0, Z(0), M(1, Z(0))),\\ Y_{S_4}&=Y(0, Z(0), M(0, Z(0))), \end{align*}\]

As before, the problem is that the distribution of $Y_{S_2}$ is not identifiable due to the so-called recanting witness $Z$. This means that one can contrast $Y_{S_1}$ with $Y_{S_3}$ but not with $Y_{S_2}$. Thus, the effects operating through $P_2$ and $P_3$ cannot be disentangled.

$Z$ is called a recanting witness because “it tells one story” $Z(1)$ for the purposes of one path, and another story $Z(0)$ for the purposes of the other.

But what if we replaced recanting witnesses by randomized versions of them that carry some of the same information? Specifically, if $T(a)$ is a random draw from the distribution of $Z(a)$ conditional on $W$, we can:

Measure $P_2$ by contrasting \[ Y(0, Z(1), M(1, {\color{ForestGreen}{T(1)}})) \text{ vs. } Y(0, Z(0), M(1, \color{ForestGreen}{T(1)})) \]
Measure $P_3$ by contrasting \[ Y(0, {\color{ForestGreen}{T(0)}}, M(1, Z(1))) \text{ vs. } Y(0, {\color{ForestGreen}{T(0)}}, M(1, Z(0))) \]
Measure the other paths through their identified natural effects

This is what we proposed in Vo et al. (2024).

Vo, Tat-Thang, Nicholas Williams, Richard Liu, Kara E Rudolph, and Ivan Dıaz. 2024. “Recanting Twins: Addressing Intermediate Confounding in Mediation Analysis.” arXiv Preprint arXiv:2401.04450.

The variables $T(a)$ are the recanting twins of $Z(1-a)$. Identification requires some of the standard assumptions, namely:

$U_A\indep (U_Y, U_M, U_Z)\mid W$,
$U_Z\indep (U_Y, U_M)\mid (A,W)$, and
$U_M\indep U_Y\mid (Z,A,W)$.

Under these assumptions, we can identify a decomposition of the ATE as follows:

\[ \text{ATE} = \psi_{P_1} + \psi_{P_2} + \psi_{P_3} + \psi_{P_4} + \psi_{\text{IC}}, \]

where

$\psi_{P_i}$ measures the effect through path $P_i$
$\psi_{\text{IC}}$ is a remainder that is equal to zero if there is no intermediate confounding, in which case $\psi_{P_i}$ is equal to the (identified) natural path-specific effect.

More details on the definition, interpretation, and identification of these effects in the paper!

# Types of path-specific causal mediation effects {#estimands} ```{r} #| label: load-renv #| echo: false #| message: false renv::autoload() library(here) ``` - Controlled direct effects - Natural direct and indirect effects - Randomized interventional direct and indirect effects - Path-specific effects using *recanting twins* ```{tikz} #| fig-cap: Directed acyclic graph under *no intermediate confounders* of the mediator-outcome relation affected by treatment \dimendef\prevdepth=0 \pgfdeclarelayer{background} \pgfsetlayers{background,main} \usetikzlibrary{arrows,positioning} \tikzset{ >=stealth', punkt/.style={ rectangle, rounded corners, draw=black, very thick, text width=6.5em, minimum height=2em, text centered}, pil/.style={ ->, thick, shorten <=2pt, shorten >=2pt,} } \newcommand{\Vertex}[2] {\node[minimum width=0.6cm,inner sep=0.05cm] (#2) at (#1) {$#2$}; } \newcommand{\VertexR}[2] {\node[rectangle, draw, minimum width=0.6cm,inner sep=0.05cm] (#2) at (#1) {$#2$}; } \newcommand{\ArrowR}[3] { \begin{pgfonlayer}{background} \draw[->,#3] (#1) to[bend right=30] (#2); \end{pgfonlayer} } \newcommand{\ArrowL}[3] { \begin{pgfonlayer}{background} \draw[->,#3] (#1) to[bend left=45] (#2); \end{pgfonlayer} } \newcommand{\EdgeL}[3] { \begin{pgfonlayer}{background} \draw[dashed,#3] (#1) to[bend right=-45] (#2); \end{pgfonlayer} } \newcommand{\Arrow}[3] { \begin{pgfonlayer}{background} \draw[->,#3] (#1) -- +(#2); \end{pgfonlayer} } \begin{tikzpicture} \Vertex{-4, 0}{W} \Vertex{0, 0}{M} \Vertex{-2, 0}{A} \Vertex{2, 0}{Y} \Arrow{W}{A}{black} \Arrow{A}{M}{black} \Arrow{M}{Y}{black} \ArrowL{W}{Y}{black} \ArrowL{A}{Y}{black} \ArrowL{W}{M}{black} \end{tikzpicture} ``` ## Controlled direct effects $$\psi_{\text{CDE}} = \E(Y_{1,m} - Y_{0,m})$$ ![](../img/graphic4a3.png){width="40%"} - Intervening on the SCM to set $M=m$ uniformly for everyone in the population - Compare $A=1$ vs $A=0$ with $M=m$ fixed ### Identification assumptions: - Confounder assumptions: - $A \indep Y_{a,m} \mid W$ - $M \indep Y_{a,m} \mid W, A$ - Positivity assumptions: - $\P(M = m \mid A=a, W) > 0 \text{ } a.e.$ - $\P(A=a \mid W) > 0 \text{ } a.e.$ Under the above identification assumptions, the controlled direct effect can be identified: $$ \E(Y_{1,m} - Y_{0,m}) = \E\{ \color{ForestGreen}{\E(Y \mid A=1, M=m, W) - \E(Y \mid A=0, M=m, W)} \} $$ - For intuition about this formula in R, let's continue with a toy example: ```{webr-r} n <- 1e6 w <- rnorm(n) a <- rbinom(n, 1, 0.5) m <- rnorm(n, w + a) y <- rnorm(n, w + a + m) ``` - First we fit a correct model for the outcome ```{webr-r} lm_y <- lm(y ~ m + a + w) ``` - Assume we would like the CDE at $m=0$ - Then we generate predictions $$ \color{ForestGreen}{\E(Y \mid A=1, M=m, W)} \text{ and } \color{ForestGreen}{\E(Y \mid A=0, M=m, W)} \ : $$ ```{webr-r} pred_y1 <- predict(lm_y, newdata = data.frame(a = 1, m = 0, w = w)) pred_y0 <- predict(lm_y, newdata = data.frame(a = 0, m = 0, w = w)) ``` - Then we compute the difference between the predicted values $\color{ForestGreen}{\E(Y \mid A=1, M=m, W) - \E(Y \mid A=0, M=m, W)}$, and average across values of $W$ ```{webr-r} ## CDE at m = 0 mean(pred_y1 - pred_y0) ``` ### Is this the estimand I want? - Makes the most sense if it is reasonable for everyone have a constant level $m \in \mathcal{M}$. - Judea Pearl calls this *prescriptive*. - Can you think of an example? (Air pollution, rescue inhaler dosage, hospital visits...) - Does not provide a decomposition of the average treatment effect into direct and indirect effects. *What if everyone having the same value of the mediator doesn't make sense?* *What if we want to decompose the average treatment effect into its direct and indirect counterparts?* ## Natural direct and indirect effects Continuing to use the same DAG as above, - Recall the definition of the nested counterfactual: \begin{equation*} Y_{1, M_0} = f_Y(W, 1, M_0, U_Y) \end{equation*} - Interpreted as *the outcome for an individual in a hypothetical world where treatment was given but the mediator was held at the value it would have taken under no treatment* ![](../img/graphic4a.png){width="40%"} - Recall that, because of the definition of counterfactuals \begin{equation*} Y_{1, M_1} = Y_1 \end{equation*} Then we can decompose the *average treatment effect* $\E(Y_1-Y_0)$ as follows \begin{equation*} \E[Y_{1,M_1} - Y_{0,M_0}] = \underbrace{\E[Y_{\color{red}{1},\color{blue}{M_1}} - Y_{\color{red}{1},\color{blue}{M_0}}]}_{\text{natural indirect effect}} + \underbrace{\E[Y_{\color{blue}{1},\color{red}{M_0}} - Y_{\color{blue}{0},\color{red}{M_0}}]}_{\text{natural direct effect}} \end{equation*} - Natural direct effect (NDE): Varying treatment while keeping the mediator fixed at the value it would have taken under no treatment - Natural indirect effect (NIE): Varying the mediator from the value it would have taken under treatment to the value it would have taken under control, while keeping treatment fixed ### Identification assumptions: - $A \indep Y_{a,m} \mid W$ - $M \indep Y_{a,m} \mid W, A$ - $A \indep M_a \mid W$ - $M_0 \indep Y_{1,m} \mid W$ - and positivity assumptions ### Cross-world independence assumption What does $M_0 \indep Y_{1,m} \mid W$ mean? - Conditional on $W$, knowledge of the mediator value in the absence of treatment, $M_0$, provides no information about the outcome under treatment, $Y_{1,m}$. - Can you think of a data-generating mechanism that would violate this assumption? - Example: in a randomized study, whenever we believe that treatment assignment works through adherence (i.e., almost always), we are violating this assumption (more on this later). - Cross-world assumptions are problematic for other reasons, including: - You can never design a randomized study where the assumption holds by design. **If the cross-world assumption holds, can write the NDE as a weighted average of controlled direct effects at each level of** $M=m$. $$ \E \sum_m \{\E(Y_{1,m} \mid W) - \E(Y_{0,m} \mid W)\} \P(M_{0}=m \mid W) $$ - If CDE($m$) is constant across $m$, then CDE = NDE. ### Identification formula: - Under the above identification assumptions, the natural direct effect can be identified: \begin{equation*} \E(Y_{1,M_0} - Y_{0,M_0}) = \E[\color{red}{\E\{}\color{ForestGreen}{\E(Y \mid A=1, M, W) - \E(Y \mid A=0, M, W)}\color{red}{\mid A=0,W\}}] \end{equation*} - The natural indirect effect can be identified similarly. - Let's dissect this formula in `R`: ```{webr-r} n <- 1e6 w <- rnorm(n) a <- rbinom(n, 1, 0.5) m <- rnorm(n, w + a) y <- rnorm(n, w + a + m) ``` - First we fit a correct model for the outcome ```{webr-r} lm_y <- lm(y ~ m + a + w) ``` - Then we generate predictions $\color{ForestGreen}{\E(Y \mid A=1, M, W)} \text{ and }\color{ForestGreen}{\E(Y \mid A=0, M, W)}$ with $A$ fixed but letting $M$ and $W$ take their observed values ```{webr-r} pred_y1 <- predict(lm_y, newdata = data.frame(a = 1, m = m, w = w)) pred_y0 <- predict(lm_y, newdata = data.frame(a = 0, m = m, w = w)) ``` - Then we compute the difference between the predicted values $\color{ForestGreen}{\E(Y \mid A=1, M, W) - \E(Y \mid A=0, M, W)},$ - and use this difference as a pseudo-outcome in a regression on $A$ and $W$: $\color{red}{\E\{}\color{ForestGreen}{\E(Y \mid A=1, M, W) - \E(Y \mid A=0, M, W)}\color{red}{\mid A=0,W\}}$ ```{webr-r} pseudo <- pred_y1 - pred_y0 lm_pseudo <- lm(pseudo ~ a + w) ``` - Now we predict the value of this pseudo-outcome under $A=0$, and average the result ```{webr-r} pred_pseudo <- predict(lm_pseudo, newdata = data.frame(a = 0, w = w)) ## NDE: mean(pred_pseudo) ``` ### Is this the estimand I want? - Makes sense to hypothetically set $A$ but not $M$. - Want to understand a natural mechanism underlying an association or total effect. Judea Pearl calls this *descriptive*. - NDE + NIE = total effect (ATE). - Okay with the assumptions. *What if our data structure involves a post-treatment confounder of the mediator-outcome relationship (e.g., adherence)?* ```{tikz} #| fig-cap: Directed acyclic graph under intermediate confounders of the mediator-outcome relation affected by treatment \dimendef\prevdepth=0 \pgfdeclarelayer{background} \pgfsetlayers{background,main} \usetikzlibrary{arrows,positioning} \tikzset{ >=stealth', punkt/.style={ rectangle, rounded corners, draw=black, very thick, text width=6.5em, minimum height=2em, text centered}, pil/.style={ ->, thick, shorten <=2pt, shorten >=2pt,} } \newcommand{\Vertex}[2] {\node[minimum width=0.6cm,inner sep=0.05cm] (#2) at (#1) {$#2$}; } \newcommand{\VertexR}[2] {\node[rectangle, draw, minimum width=0.6cm,inner sep=0.05cm] (#2) at (#1) {$#2$}; } \newcommand{\ArrowR}[3] { \begin{pgfonlayer}{background} \draw[->,#3] (#1) to[bend right=30] (#2); \end{pgfonlayer} } \newcommand{\ArrowL}[3] { \begin{pgfonlayer}{background} \draw[->,#3] (#1) to[bend left=45] (#2); \end{pgfonlayer} } \newcommand{\EdgeL}[3] { \begin{pgfonlayer}{background} \draw[dashed,#3] (#1) to[bend right=-45] (#2); \end{pgfonlayer} } \newcommand{\Arrow}[3] { \begin{pgfonlayer}{background} \draw[->,#3] (#1) -- +(#2); \end{pgfonlayer} } \begin{tikzpicture} \Vertex{0, -1}{Z} \Vertex{-4, 0}{W} \Vertex{0, 0}{M} \Vertex{-2, 0}{A} \Vertex{2, 0}{Y} \ArrowR{W}{Z}{black} \Arrow{Z}{M}{black} \Arrow{W}{A}{black} \Arrow{A}{M}{black} \Arrow{M}{Y}{black} \Arrow{A}{Z}{black} \Arrow{Z}{Y}{black} \ArrowL{W}{Y}{black} \ArrowL{A}{Y}{black} \ArrowL{W}{M}{black} \end{tikzpicture} ``` ![](../img/bottomdag.png){width="80%"} ### Unidentifiability of the NDE and NIE in this setting - In this example, natural direct and indirect effects are not generally point identified from observed data $O=(W,A,Z,M,Y)$. - The reason for this is that the cross-world counterfactual assumption $$Y_{1,m} \indep M_0 \mid W$$ does not hold in the above directed acyclic graph. - To give intuition, we focus on the counterfactual outcome $Y_{A=1, M_{A=0}}$. - This counterfactual outcome involves two counterfactual worlds simultaneously: one in which $A=1$ for the first portion of the counterfactual outcome, and one in which $A=0$ for the nested portion of the counterfactual outcome. - Setting $A=1$ induces a counterfactual treatment-induced confounder, denoted $Z_{A=1}$. Setting $A=0$ induces another counterfactual treatment-induced confounder, denoted $Z_{A=0}$. - The two treatment-induced counterfactual confounders, $Z_{A=1}$ and $Z_{A=0}$ share unmeasured common causes, $U_Z$, which creates a spurious association. - Because $Z_{A=1}$ is causally related to $Y_{A=1, M=m}$, and $Z_{A=0}$ is also casually related to $M_{A=0}$, the path through $U_Z$ means that the backdoor criterion is not met for identification of $Y_{A=1, M_{A=0}}$, i.e., $M_{0} \notindep Y_{A=1, m} \mid W$, where $W$ denotes baseline covariates. ```{tikz} #| fig-cap: Parallel worlds model of the scenario considered \dimendef\prevdepth=0 \pgfdeclarelayer{background} \pgfsetlayers{background,main} \usetikzlibrary{arrows,positioning} \tikzset{ >=stealth', punkt/.style={ rectangle, rounded corners, draw=black, very thick, text width=6.5em, minimum height=2em, text centered}, pil/.style={ ->, thick, shorten <=2pt, shorten >=2pt,} } \newcommand{\Vertex}[2] {\node[minimum width=0.6cm,inner sep=0.05cm] (#2) at (#1) {$#2$}; } \newcommand{\VertexR}[2] {\node[rectangle, draw, minimum width=0.6cm,inner sep=0.05cm] (#2) at (#1) {$#2$}; } \newcommand{\ArrowR}[3] { \begin{pgfonlayer}{background} \draw[->,#3] (#1) to[bend right=30] (#2); \end{pgfonlayer} } \newcommand{\ArrowL}[3] { \begin{pgfonlayer}{background} \draw[->,#3] (#1) to[bend left=45] (#2); \end{pgfonlayer} } \newcommand{\EdgeL}[3] { \begin{pgfonlayer}{background} \draw[dashed,#3] (#1) to[bend left=45] (#2); \end{pgfonlayer} } \newcommand{\EdgeR}[3] { \begin{pgfonlayer}{background} \draw[dashed,#3] (#1) to[bend right=-45] (#2); \end{pgfonlayer} } \newcommand{\Arrow}[3] { \begin{pgfonlayer}{background} \draw[->,#3] (#1) -- +(#2); \end{pgfonlayer} } \begin{tikzpicture} \Vertex{0, 1}{Z_0} \Vertex{0, -1}{Z_1} \Vertex{-2, 0}{U_Z} \Vertex{2, 1}{M_0} \Vertex{4, -1}{Y_{1, m}} \Arrow{Z_0}{M_0}{black} \Arrow{Z_1}{Y_{1, m}}{black} \Arrow{U_Z}{Z_0}{black} \Arrow{U_Z}{Z_1}{black} \end{tikzpicture} ``` ```{=html}  ``` ```{=html}  ``` However: - We can actually identify the NIE/NDE if we are willing to invoke monotonicity between a treatment and one or more binary treatment-induced confounders [@tchetgen2014identification]. - Assuming monotonicity is also sometimes referred to as assuming "no defiers"---in other words, assuming that there are no individuals who would do the opposite of the encouragement. - Monotonicity may seem like a restrictive assumption, but may be reasonable in some common scenarios (e.g., in trials where the intervention is randomized treatment assignment and the treatment-induced confounder is whether or not treatment was actually taken---in this setting, we may feel comfortable assuming that there are no "defiers", frequently assumed when using IVs to identify causal effects) Note: CDEs are still identified in this setting. They can be identified and estimated similarly to a longitudinal data structure with a two-time-point intervention. ## Randomized interventional (in)direct effects - Let $G_a$ denote a random draw from the distribution of $M_a \mid W$ - Define the counterfactual $Y_{1,G_0}$ as the counterfactual variable in a hypothetical world where $A$ is set $A=1$ and $M$ is set to $M=G_0$ with probability one. ![](../img/graphic4b.png){width="60%"} - Define $Y_{0,G_0}$ and $Y_{1,G_1}$ similarly - Then we can define: $$ \E[Y_{1,G_1} - Y_{0,G_0}] = \underbrace{\E[Y_{\color{red}{1}, \color{blue}{G_1}} - Y_{\color{red}{1}, \color{blue}{G_0}}]}_{\text{interventional indirect effect}} + \underbrace{\E[Y_{\color{blue}{1},\color{red}{G_0}} - Y_{\color{blue}{0}, \color{red}{G_0}}]}_{\text{interventional direct effect}} $$ - Note that $\E[Y_{1,G_1} - Y_{0,G_0}]$ is still an *overall effect* of treatment, even if it is different from the ATE $\E[Y_{1} - Y_{0}]$ - We gain in the ability to solve a problem, but lose in terms of interpretation of the causal effect (cannot decompose the ATE) ### An alternative definition of the effects: - Above we defined $G_a$ as a random draw from the distribution of $M_a \mid W$ - What if instead we define $G_a$ as a random draw from the distribution of $M_a \mid (Z_a,W)$ - It turns out the indirect effect defined in this way only measures the path $A \rightarrow M \rightarrow Y$, and not the path $A \rightarrow Z \rightarrow M \rightarrow Y$ - There may be important reasons to choose one over another (e.g., survival analyses where we want the distribution conditional on $Z$, instrumental variable designs where it doesn't make sense to condition on $Z$) ```{=html}  ``` ### Identification assumptions: - $A \indep Y_{a,m} \mid W$ - $M \indep Y_{a,m} \mid W, A, Z$ - $A \indep M_a \mid W$ - and positivity assumptions. Under these assumptions, the randomized interventional direct and indirect effect is identified: \begin{align*} \E &(Y_{a, G_{a'}}) = \\ & \E\left[\color{Purple}{\E\left\{\color{red}{\sum_z} \color{ForestGreen}{\E(Y \mid A=a, Z=z, M, W)} \color{red}{\P(Z=z \mid A=a, W)}\mid A=a', W\right\}} \right] \end{align*} - Let's dissect this formula in `R`: ```{webr-r} n <- 1e6 w <- rnorm(n) a <- rbinom(n, 1, 0.5) z <- rbinom(n, 1, 0.5 + 0.2 * a) m <- rnorm(n, w + a - z) y <- rnorm(n, w + a + z + m) ``` - Let us compute $\E(Y_{1, G_0})$ (so that $a = 1$, and $a'=0$). - First, fit a regression model for the outcome, and compute $\color{ForestGreen}{\E(Y \mid A=a, Z=z, M, W)}$ for all values of $z$ ```{webr-r} lm_y <- lm(y ~ m + a + z + w) pred_a1z0 <- predict(lm_y, newdata = data.frame(m = m, a = 1, z = 0, w = w)) pred_a1z1 <- predict(lm_y, newdata = data.frame(m = m, a = 1, z = 1, w = w)) ``` - Now we fit the true model for $Z \mid A, W$ and get the conditional probability that $Z=1$ fixing $A=1$ ```{webr-r} prob_z <- lm(z ~ a) pred_z <- predict(prob_z, newdata = data.frame(a = 1)) ``` - Now we compute the following pseudo-outcome: $\color{red}{\sum_z}\color{ForestGreen}{\E(Y \mid A=a, Z=z, M, W)} \color{red}{\P(Z=z \mid A=a, w)}$ ```{webr-r} pseudo_out <- pred_a1z0 * (1 - pred_z) + pred_a1z1 * pred_z ``` - Now we regress this pseudo-outcome on $A,W$, and compute the predictions setting $A=0$, that is, $$ \color{Purple}{\E\left\{\color{red}{\sum_z} \color{ForestGreen}{\E(Y \mid A=a, Z=z, M, W)} \color{red}{\P(Z=z \mid A=a, w)}\mid A=a', W\right\}} $$ ```{webr-r} fit_pseudo <- lm(pseudo_out ~ a + w) pred_pseudo <- predict(fit_pseudo, data.frame(a = 0, w = w)) ``` - And finally, just average those predictions! ```{webr-r} # Mean(Y(1, G(0))) mean(pred_pseudo) ``` - This was for $(a,a')=(1,0)$. Can do the same with $(a,a')=(1,1)$, and $(a,a')=(0,0)$ to obtain an effect decomposition \begin{equation*} \E[Y_{1,G_1} - Y_{0,G_0}] = \underbrace{\E[Y_{\color{red}{1}, \color{blue}{G_1}} - Y_{\color{red}{1}, \color{blue}{G_0}}]}_{\text{interventional indirect effect}} + \underbrace{\E[Y_{\color{blue}{1},\color{red}{G_0}} - Y_{\color{blue}{0}, \color{red}{G_0}}]}_{\text{interventional direct effect}} \end{equation*} ### Is this the estimand I want? - Do not want to intervene directly on $M$ in the causal model. - Goal is to understand a descriptive type of mediation. - Okay with the assumptions! ### But, there is an important limitation of randomized interventional effects @miles2022causal recently uncovered an important limitation of these effects, which can be described as follows. The sharp mediational null hypothesis can be defined as $$ H_0:Y(a, M(a')) = Y(a, M(a^\star));\text{ for all }a, a', a^\star \ . $$ The problem is that randomized interventional effects are not guaranteed to be null when the sharp mediational hypothesis is true. This could present a problem in practice if some subgroup of the population has a relationship between $A$ and $M$, but not between $M$ and $Y$. Then, another distinct subgroup of the population has a relationship between $M$ and $Y$ but not between $A$ and $M$. In such a scenario, the randomized interventional indirect effect would be nonzero, but there would be no one person in the population whose effect of $A$ on $Y$ would be mediated by $M$. ```{=html}  ``` More details can be found in the original paper. ## Path-specific effects using *recanting twins* The DAG we are working under contains four paths of interest: \begin{align*} P_1&: A\rightarrow Y;\\ P_2&: A \rightarrow Z \rightarrow Y;\\ P_3&: A \rightarrow Z \rightarrow M \rightarrow Y\\ P_4&: A \rightarrow M \rightarrow Y, \end{align*} Natural path-specific effects could be defined using the following nested counterfactuals: \begin{align*} Y_{S_0}&=Y(1, Z(1), M(1, Z(1))),\\ Y_{S_1}&=Y(0, Z(1), M(1, Z(1))),\\ Y_{S_2}&=Y(0, Z(0), M(1, Z(1))),\\ Y_{S_3}&=Y(0, Z(0), M(1, Z(0))),\\ Y_{S_4}&=Y(0, Z(0), M(0, Z(0))), \end{align*} As before, the problem is that the distribution of $Y_{S_2}$ is not identifiable due to the so-called recanting witness $Z$. This means that one can contrast $Y_{S_1}$ with $Y_{S_3}$ but not with $Y_{S_2}$. Thus, the effects operating through $P_2$ and $P_3$ cannot be disentangled. $Z$ is called a recanting witness because "it tells one story" $Z(1)$ for the purposes of one path, and another story $Z(0)$ for the purposes of the other. But what if we replaced recanting witnesses by randomized versions of them that carry some of the same information? Specifically, if $T(a)$ is a random draw from the distribution of $Z(a)$ conditional on $W$, we can: - Measure $P_2$ by contrasting $$ Y(0, Z(1), M(1, {\color{ForestGreen}{T(1)}})) \text{ vs. } Y(0, Z(0), M(1, \color{ForestGreen}{T(1)})) $$ - Measure $P_3$ by contrasting $$ Y(0, {\color{ForestGreen}{T(0)}}, M(1, Z(1))) \text{ vs. } Y(0, {\color{ForestGreen}{T(0)}}, M(1, Z(0))) $$ - Measure the other paths through their identified natural effects This is what we proposed in @vo2024recanting. The variables $T(a)$ are the *recanting twins* of $Z(1-a)$. Identification requires some of the standard assumptions, namely: - $U_A\indep (U_Y, U_M, U_Z)\mid W$, - $U_Z\indep (U_Y, U_M)\mid (A,W)$, and - $U_M\indep U_Y\mid (Z,A,W)$. Under these assumptions, we can identify a decomposition of the ATE as follows: $$ \text{ATE} = \psi_{P_1} + \psi_{P_2} + \psi_{P_3} + \psi_{P_4} + \psi_{\text{IC}}, $$ where - $\psi_{P_i}$ measures the effect through path $P_i$ - $\psi_{\text{IC}}$ is a remainder that is equal to zero if there is no intermediate confounding, in which case $\psi_{P_i}$ is equal to the (identified) natural path-specific effect. More details on the definition, interpretation, and identification of these effects in the paper!