This post is part of our Q&A series.
A question from graduate students in our Fall 2019 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley:
Question:
Hi Mark,
We were wondering about the application of TMLE and superlearner to cluster-randomized study designs, and the adoption of the sample average treatment effect (SATE) as an efficient estimator. From our understanding, although the SATE is not formally identifiable in a finite setting, it is nevertheless an efficient estimate due to its asymptotic behavior (TMLE for the population effect is asymptotically linear and has an asymptotically conservative variance estimator). What properties of the SATE make it preferable to the population average treatment effect, particularly in effect modification settings? What allows for valid causal inferences to be drawn from data adaptive parameters like the SATE?
Best,
A.A. and D.C.
Answer:
Hi A.A. and D.C.,
Let’s say we observe n observations Oi=(Wi,Ai,Yi) representing cluster specific data structures, and we assume the randomization assumption, P(A=1|W,Y0,Y1)=P(A=1|W). The sample average treatment effect is defined as SATE=1n∑i(Yi(1)−Yi(0)), which is different from the sample average conditional treatment effect SACTE=1n∑iE(Y1−Y0|Wi), and the latter is again different from the ATE=E(Y1−Y0).
The ATE is an average across a distribution of W, which in a cluster RCT would mean it is a population average across clusters from some population of clusters. In many cluster RCTs, the sample of clusters is not sampled that way at all, but represents a selected convenient sample. Therefore, in that case, it might make more sense to define a parameter from the conditional distribution (Yi,Ai), given Wi, across i=1,…,n, i.e. treating the clusters as fixed, and (A,Y) within each cluster as random. This makes the SACTE an interesting alternative target parameter, which can be viewed as a parameter of the conditional distribution given W1,…,Wn, or, one can view it as a data adaptive parameter depending on the empirical distribution of W1,…,Wn if one is still willing to view Wi as a random sample from some population.
If one is not even willing to think of Y1−Y0 as a random sample from a conditional distribution P(Y1−Y0|W), but only wants to make inference about the actual values Yi(1)−Yi(0),i=1,…,n, then one could view the SATE as the target. So the choice of quantity (among ATE, SACTE, SATE) is driven by till what degree we wish to generalize our findings to a bigger population. In various applications I might argue that all three are of interest.
Let TMLE represent the regular TMLE of the ATE. Recall that TMLE-ATE ∼Pn(DW+DY)=1n∑i(DW+DY)(Oi), where DW,DY are the two score components making up the influence curve DW+DY of the TMLE.
Note that SATE-ATE (just a sample mean of Y1−Y0minus true mean) is asymptotically linear with influence curve Y1−Y0−E(Y1−Y0)=Y1−Y0−E(Y1−Y0|W)+E(Y1−Y0|W)−E(Y1−Y0)
So, TMLE-SATE = TMLE-ATE + ATE-SATE ∼PnDY−1n∑i(Y1−Y0)−E(Y1−Y0|W). Similarly, TMLE-SACTE ∼PnDY.
We conclude: TMLE-SACTE is asymptotically linear with an improved influence curve DY, having subtracted out the DW component. The TMLE-SATE is asymptotically linear with a further improved influence curve DY−DU, where DU=(Y1−Y0−E(Y1−Y0|W). The latter DU is not really an influence curve since Y0,Y1 are not observed. Nonetheless, it tells us the the TMLE-SATE is asymptotically linear with inflluence curve DY−DU and, showing that TMLE-SATE is more efficient than TMLE-SACT. For the sake of inference, we simply use DY as a conservative influence curve.
I believe the general idea of this is the following. Consider a target E[X] for a full data random variable X, and suppose that the observed data includes observing W. Suppose that we have a TMLE of EX. One could define 1n∑iXi,1n∑iE(X|Wi), and analyze the TMLE- 1n∑iXi exactly same was as above.
For example, suppose that we have a general longitudinal data structure,
Wi=Li(0),A(0),…,L(K),A(K),Y, and we define EYd as
a mean outcome under a multiple time point dynamic treatment. We have a TMLE of
EYd, such as the one implemented in ltmle()
. We might desire
inference for 1n∑iYd,i, or 1n∑iE(Yd|Wi). We have ΨTMLE−1n∑iEYd,i=ΨTMLE−EYd−[1n∑iYd,i−E(Yd|Wi)]−[1n∑iE(Yd|Wi)−EYd)]. The latter represents the DW
component of the influence curve of the ΨTMLE−EYd.
The other component is a non-identifiable influence curve that subtracts out
another component. So, we obtain conservative inference for 1n∑iEYd,i by using the influence curve of
ΨTMLE−EYd without the DW component of its
influence curve.
Regarding effect modification, if we have a discrete variable V, then a stratified TMLE applied to data with Vi=v would obtain inference for 1n∑i(Yi(1)−Yi(0)) within strata Vi=v, for each v. To obtain inference for this v-specific SATE, one can use the conservative influence curve.
If one now wants to obtain inference for a difference of two v-specific SATEs, then the TMLE of this difference will still be asymptotically linear with the difference of the two v-specific non-identifiable influence curves. It is now less clear if ignoring the difference of the two non-identifiable components of their respective influence curves would still result in conservative inference. It would be worthwhile to research this. Since we have valid conservative inference for the v-specific SATE for each v, we could also decide to build a test based on comparing the two marginal confidence intervals (overlap), but this would by necessity be more conservative. If this inference for a contrast of v-specific SATEs happens to be problematic, then that might be an argument to instead focus on the effect modification parameter (contrast of v-specific SACTE).
1n∑iE(Y1−Y0|Wi,Vi=1)−1n∑iE(Y1−Y0|Wi,Vi=0) instead since for this we have an identified influence curve.
If V is continuous, one might use a working MSM mβ(v) for
1n∑iE(Y1−Y0|Wi,Vi=v) as a function of v.
One can then use the TMLE of the beta in this working MSM (as implemented in
ltmle
e.g.). This would again correspond with using an influence curve that
would remove a DW component of the regular influence curve of the TMLE of
β.
So my basic answer to your question is that inference for the SATE based on the TMLE of the ATE can be generalized to general longitudinal data structures, and, one should be able to also generalize it to treatment effect modification by a discrete or continuous effect modifier V.
Best Wishes,
Mark
P.S., remember to write in to our blog at vanderlaan (DOT) blog [AT]
berkeley (DOT) edu
. Interesting questions will be answered on our blog!