# Causal mediation analysis for exposure mixtures

This post is part of our Q&A series.

A question from graduate students in our Spring 2021 offering of the new course “Targeted Learning in Practice” at UC Berkeley:

## Question:

Hi Mark,

After the lectures on tmle3shift and tmle3mediate, we’re wondering if a different procedure for mediation analysis could work. Consider a data-generating system for $O = (W, A, Z, Y)$, where $W$ represents three binary covariates, $A$ is a binary exposure of interest, $Z$ are three binary mediators, and $Y$ is a continuous outcome. Fitting a Super Learner for $\mathbb{E}(Y \mid A,W)$ (ignoring $Z$), and then getting predictions through this fit for the counterfactuals $\mathbb{E}(Y \mid A=1, W)$ and $\mathbb{E}(Y \mid A=0, W)$, we can construct $\mathbb{E}(Y \mid A=1, W) - \mathbb{E}(Y \mid A=0, W)$, which represents the total effect. Let’s call this model $\mathcal{M}_{\text{total}}$. Now, consider another model fit for $\mathbb{E}(Y \mid A, Z, W)$ and conduct the same procedure. The model now controls for the mediators, and, by deterministically setting $A = 1$ and $A = 0$, we break the connection with the mediatorsand therefore can calculate the direct effect; call this model $\mathcal{M}_{\text{NDE}}$. The natural indirect effect can be calculated as $\mathcal{M}_{\text{total}} - \mathcal{M}_{\text{NDE}}$. This works in simulation experiments (using simulated data for tmle3mediate). Of course, it’s kind of bullshit because we have tmle3mediate, which can calculate the NDE/NIE for a binary exposure; this two-model approach is just borrowed from the GLM-based methods developed ages ago, but it uses Super Learning instead. However, there are some interesting scenarios where this might be useful. For example, when there are multiple exposure. So, consider we have $A = (A_1, A_2, A_3)$, all binary. The same procedure could be followed by setting all $A$s to 1 and 0 and calculating the NDE and NIE in this way. Could this serve as a non-parametric approach to mediation analysis for mixtures? Of course, a bootstrap procedure would need to be done to derive confidence intervals. Similarly, now consider $A$ is continuous; the same procedure could be conducted but now shifting $A$ by some $\delta$. For example, the NDE and NIE can be calculated for all chemicals of a joint exposure if the chemical exposures were all shifted by 10%. This seems more intuitive, especially for researchers, than shifting the propensity, as with tmle3shift. Now, taken one step further, given the unique properties of highly adaptive lasso (HAL), is it possible to do a similar procedure but with the HAL basis functions? Meaning that a HAL is fit to the data $\mathbb{E}(Y \mid A,Z,W)$. Is it possible to then make predictions through the model to calculate the NDE and NIE by shifting certain basis functions? If so, given the unique properties of HAL (can be used for statistical inference without TMLE), could one get around the non-parametric bootstrap?

Best,

D.M. and J.B.

Hi D.M. and J.B.,

Thank you for the interesting questions.

You are talking about using a super learner of $\mathbb{E}(Y \mid A, W)$ and $\mathbb{E}(Y \mid A, Z, W)$ to obtain plug-in estimators of the total effect and its direct and indirect effect components, possibly controlled direct effects (CDE) instead of natural direct effects (NDE), which are essentially averages of controlled direct effects, i.e., $NDE = \mathbb{E}(\sum_z (Y_{1,z} - Y_{0,z}) g(z \mid A = 0, W))$.

You consider the case that $A$ is binary, a vector of binaries, or even vector of continuous exposures. It is important to stick to the roadmap. Super learner-based plug-in estimators are generally not supported by theory, so inference is a problem; moreover, these are also generally overly biased, making it unlikely that a bootstrap can yield accurate inference (i.e., estimating variance is one thing but estimating bias is a whole other beast). Following the roadmap is all about defining the target estimands of interest and the statistical model; then, you can use or develop a TMLE and possibly an undersmoothed HAL, without TMLE.

This also makes sure we obtain compatible estimators so that the total effect, and its direct and indirect effect components, are all compatible with a single fit of the data density. This helps improve finite-sample behavior.

And, yes, we can plug in a HAL fit into, e.g., a shift intervention parameter $\mathbb{E}_P \int_a \mathbb{E}(Y \mid A=a, W) g_{\delta}(a \mid W) da$. When using HAL, both bootstrap-based inference and influence curve-based inference are possible. The latter is fast while the former might have better coverage in finite samples.

Regarding causal effects and direct effects of multidimensional vectors of exposures, I think stochastic interventions are the way to go. For example, in work with Chris Kennedy, we proposed a conditional distribution of the vector $A$ of exposures given $W$, using the fact that a score $S(A)$ equals a value, which happens to correspond with the static intervention on $S(A)$. So, in general, I like to think of staying close to the conditional distribution of $A$, given $W$, and, for that purpose, replacing it by a shift or adding an extra conditioning are all sensible ways to intervene. In this way, we keep positivity intact. Your multivariate shift sounds very interpretable, so would be good to formulate a TMLE. My guess is that it will be better to stick to univariate interventions, such as interventions on a data-adaptively learned summary measure of $A$, and then provide the corresponding stochastic intervention interpretation for $A$ as a whole. That way, we know that the essential statistical estimation problem is equivalent with that of learning the causal effect of a single continuous exposure. Making it two-dimensional is still okay, but it might run out of hand quickly and become very unstable if we carry out truly high-dimensional interventions. In the end, it is all about $g^{\star}/g$, but now with multivariate conditional distributions of $A$.

I think of it more as, yes, we want to understand the overall dose response curve of the vector $A$, but could we do that by targeting one specific feature of this curve at a time, using TMLE across many features. For example, intervening on a score $S(A)$, across many possible scores $S$, might teach us a lot about the overall dose-response curve for multivariate interventions on the whole $A$.

Nowadays, we can do a one-step TMLE targeting all these smooth target features, making it all compatible with a single targeted fit of $\mathbb{E}(Y \mid A,W)$ or $\mathbb{E}(Y \mid W,A,Z)$.

Best Wishes,

Mark

P.S., remember to write in to our blog at vanderlaan (DOT) blog [AT] berkeley (DOT) edu. Interesting questions will be answered on our blog!