The Research Group of Mark van der Laan

Computational Biology and Causality

Prediction intervals using the TMLE framework

This post is part of our Q&A series. A question from graduate students in our Spring 2019 offering of “Targeted Learning in Biomedical Big Data” at Berkeley: Question: Hi Mark, We are curious about how to use TMLE and influence curves for estimation and inference when the target parameter is a conditional expectation, rather than a scalar. Specifically, suppose I have a data structure $O = (W, Y) \sim P_0$, and sample $n$ times i.

Applications of TMLE in infectious disease research

This post is part of our Q&A series. A question from graduate students in our Spring 2019 offering of “Targeted Learning in Biomedical Big Data” at Berkeley: Question: Hi Mark, Thanks for teaching this class. It’s been an amazing experience. I have a few questions related to my own research. In infectious disease studies, modeling attempts to create models that estimate protection conferred from vaccination or previous history of infection (natural immunity).

Adaptive algorithm selection via the Super Learner

This post is part of our Q&A series. A question from graduate students in our Spring 2019 offering of “Targeted Learning in Biomedical Big Data” at Berkeley: Question: Hi Mark, A couple questions I have are about super learning and the strength of the learners as well as potentially adaptively choosing learners. Is there any advantage, theoretical or practical, of having a large library of weaker learners over a small library of stronger learners?

TMLE versus one-step estimator

This post is part of our Q&A series. A question from graduate students in our Spring 2019 offering of “Targeted Learning in Biomedical Big Data” at Berkeley: Question: Hi Mark, Is there any theoretical guarantees about relative performances between TMLE and the one-step estimator in finite sample conditions? Thanks. H. R.d.B. Answer: Hi H. R.d.B., Finite sample guarantees are very hard to obtain. One can obtain finite-sample confidence intervals by, for example, not relying on a CLT but on finite-sample inequalities for sample means (e.

Imputation and missing data in the TMLE framework

This post is part of our Q&A series. A question from graduate students in our Spring 2019 offering of “Targeted Learning in Biomedical Big Data” at Berkeley: Question: Hi Mark, For a longitudinal data set if we have missing data, we might want to impute the values with MICE imputation (multiple imputation with chain equations). Can we use TMLE together with multiple imputation? How can we combine the results of all the multiple imputed datasets into a final result and obtain valid inference?

Adaptive designs and optimal subgroups

This post is part of our Q&A series. A question from graduate students in our Fall 2018 offering of “Special Topics in Biostatistics – Adaptive Designs” at Berkeley: Question: Hi Mark, We were interested in your opinion on few topics that have come up in class a few times. If we isolate an optimal subgroup, we can, perhaps, answer interesting questions about, say, drug efficacy (as in, does this drug work for anybody as opposed to on average?

Adaptive sequential designs and optimal treatments

This post is part of our Q&A series. A question from graduate students in our Fall 2018 offering of “Special Topics in Biostatistics – Adaptive Designs” at Berkeley: Question: Hi Mark, Our question concerns the benefit of using a sequential adaptive design when estimating the outcome under the optimal dynamic treatment rule (for a binary treatment). We propose doing so in a 2-stage framework, where in the first stage subjects are naively randomized to treatment, $Pr(A=1) = 0.

Causal effects for single-group policies

This post is part of our Q&A series. A question from a graduate student in our Spring 2018 offering of “Targeted Learning in Biomedical Big Data” at Berkeley: Question: Hi Mark, I was thinking that if you addressed the question that [we] discussed in your office hours last week, a lot of economists would be interested in reading it. Feel free to edit the wording of the question however suits you best, but I was thinking: How can you formulate a causal parameter in a setting in which you have a policy that affects one group but not another based on observable characteristics and control for time trends in your model (i.

Finite Sample Properties of TML Estimators

This post is part of our Q&A series. A question from a graduate student in our Fall 2017 offering of “Survival Analysis and Causality” at Berkeley: Question: Hi Mark, This may be an ill-defined question, but I was wondering, in the usual $O = (W, A, Y)$ set-up, while TMLE has superior asymptotic properties over competing estimators like, say, the G-computation plug-in estimator or the IPTW estimator, are there specific instances where it is also guaranteed to have superior finite sample properties as well?

Competing Risks and Non-pathwise Differentiable Parameters

This post is part of our Q&A series. A question from two graduate students in our Fall 2017 offering of “Survival Analysis and Causality” at Berkeley: Question: Hi Mark, Below are [two] questions [we thought might interest you]. Looking forward to your thoughts on these! Best, S.D. and I.M. Most competing risk analyses assume that the competing risks are independent of one another. What would be your advice on handling the same style of survival data when the occurrence of one of the competing events is informative of the occurrence of the other?