The Research Group of Mark van der Laan

Computational Biology and Causality

Two-stage sampling and survival analysis

This post is part of our Q&A series. A question from graduate students in our Fall 2019 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley: Question: Hi Mark, We are wondering under your framework, how to deal with a situation when only right-censored data has a full set of covariates, while the covariates for the non-right-censored data are largely missing. To be specific, we want to find the relation between peoples’ matching property and their marriage durations.

Estimating the sample average treatment effect under effect modification in a cluster randomized trial

This post is part of our Q&A series. A question from graduate students in our Fall 2019 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley: Question: Hi Mark, We were wondering about the application of TMLE and superlearner to cluster-randomized study designs, and the adoption of the sample average treatment effect (SATE) as an efficient estimator. From our understanding, although the SATE is not formally identifiable in a finite setting, it is nevertheless an efficient estimate due to its asymptotic behavior (TMLE for the population effect is asymptotically linear and has an asymptotically conservative variance estimator).

Longitudinal causal model under obscured time-ordering

This post is part of our Q&A series. A question from graduate students in our Fall 2019 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley: Question: Hi Mark, Suppose we have a longitudinal data structure where information about the intervention and time-varying covariate is collected simultaneously, and their temporal ordering is obscured. For instance, data is collected at monthly health checkups, where $A(t)$ is the subject’s healthy eating habits in the past month, and $L(t)$ is the occurrence of heartburn in the past month.

Positivity assumption violations and TMLE for longitudinal data with many time-varying covariates

This post is part of our Q&A series. A question from graduate students in our Fall 2019 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley: Question: Hi Mark, For longitudinal data such as $O=(L_0,A_0,Y_0,L_1,A_1,Y_1,L_2,A_2,Y_2,\ldots )$, we can use G-computation formula with sequential regression method if we treat time $t$ as discrete variable. And you also mentioned that there are more general methods which can deal with the case when $t$ is continuous.

Simultaneous inference with the Kaplan-Meier estimator of survival

This post is part of our Q&A series. A question from graduate students in our Fall 2019 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley: Question: Hi Mark, First of all, I have doubts regarding the simultaneous confidence interval for Kaplan- Meier, since I am not necessarily interested in inference for a parameter. I would like > to know if the 95% confidence band for my KM estimator will hold using the same formula > we did in our R lab without covariates (taken from lectures).

CV-TMLE and double machine learning

This post is part of our Q&A series. A question from Twitter on choosing between double machine learning and TMLE with cross-validation: https://twitter.com/emaadmanzoor/status/1208924841316880385 Question: @mark_vdlaan Is there an applied researcher’s guide to choosing between double machine learning and TMLE + cross-fitting? PS: Thanks for making these methods and resources so easily accessible! Answer: Thanks for this interesting question. In the past several years, the interest in these machine learning-based estimators has become more widespread, since they allow for the statistical answer to a question to be framed in terms of scientifically meaningful parameters (e.

Prediction intervals using the TMLE framework

This post is part of our Q&A series. A question from graduate students in our Spring 2019 offering of “Targeted Learning in Biomedical Big Data” at Berkeley: Question: Hi Mark, We are curious about how to use TMLE and influence curves for estimation and inference when the target parameter is a conditional expectation, rather than a scalar. Specifically, suppose I have a data structure $O = (W, Y) \sim P_0$, and sample $n$ times i.

Applications of TMLE in infectious disease research

This post is part of our Q&A series. A question from graduate students in our Spring 2019 offering of “Targeted Learning in Biomedical Big Data” at Berkeley: Question: Hi Mark, Thanks for teaching this class. It’s been an amazing experience. I have a few questions related to my own research. In infectious disease studies, modeling attempts to create models that estimate protection conferred from vaccination or previous history of infection (natural immunity).

Adaptive algorithm selection via the Super Learner

This post is part of our Q&A series. A question from graduate students in our Spring 2019 offering of “Targeted Learning in Biomedical Big Data” at Berkeley: Question: Hi Mark, A couple questions I have are about super learning and the strength of the learners as well as potentially adaptively choosing learners. Is there any advantage, theoretical or practical, of having a large library of weaker learners over a small library of stronger learners?

TMLE versus the one-step estimator

This post is part of our Q&A series. A question from graduate students in our Spring 2019 offering of “Targeted Learning in Biomedical Big Data” at Berkeley: Question: Hi Mark, Is there any theoretical guarantees about relative performances between TMLE and the one-step estimator in finite sample conditions? Thanks. H. R.d.B. Answer: Hi H. R.d.B., Finite sample guarantees are very hard to obtain. One can obtain finite-sample confidence intervals by, for example, not relying on a CLT but on finite-sample inequalities for sample means (e.