This post is part of our Q&A series.
A question from graduate students in our Spring 2021 offering of the new course “Targeted Learning in Practice” at UC Berkeley:
Question:
Hi Mark,
I have a survival analysis question. I am working with a dataset that is left- and right-truncated. I am interested in estimating the treatment-specific multivariate survival function of a time-to-event variable. For example, a study where subjects have been randomized to two different treatment groups with baseline covariates
$W$
, but we only observe the outcome – time at death – for a left- and right-truncated window. Is it possible to use Targeted Maximum Likelihood Estimation (TMLE) for estimating the treatment-specific multivariate survival curve?I have seen a few papers using TMLE for right-censored data, but I assume there are important considerations when working with doubly-truncated data.
Best,
C.B.
Answer:
Hi C.B.,
Thank you for the excellent question. You are asking about estimation of
a treatment-specific survival curve when we have a time window and a subject is
only part of the sample if a particular event such as death does not occur
before the start of the window, so the sample is conditional on $T > C_l$
, or
we only observe units when $T < C_l$
, for some truncation random variable
$C_l$
and time until event $T$
.
In another post, I talk more about this problem of left-truncation and
censoring. I will refer you to that for your question as well. Either way, yes,
TMLE can be applied for any estimation problem, so it is just a matter of
establishing the identification of the full-data distribution P_X$
from
observing $O = \Phi(C, X)$
from a conditional distribution of $T > C_l$
,
say, thereby handling both the biased sampling due to sampling conditional on
$T > C_l$
as well as the more regular right-censoring, etc., making up
a censored data structure $\Phi(C, X)$
. For example, one might be able to
assume $C_l$
is independent of $X$
, conditional on measured variables, and
show that a conditional distribution of $X$
, given $T > C_l$
, implies the
distribution of the full-data random variable $X$
or a large part of it, so
that a two-stage identification, first identifying $P_X$
from$P_{X \mid
T > C_l}$
and then identifying $P_{X \mid T > C_l}$
from $P$
of $O
= \phi(C, X)$
given $T > C_l$
. Once we have done that, we can map the target
quantity $\Psi^F(P_X)$
into an estimand $\Psi(P)$
, specify the statistical
model, and then we are ready to apply TMLE.
Best Wishes,
Mark
P.S., remember to write in to our blog at vanderlaan (DOT) blog [AT]
berkeley (DOT) edu
. Interesting questions will be answered on our blog!