# The Research Group of Mark van der Laan

## Two-stage sampling and survival analysis

This post is part of our Q&A series. A question from graduate students in our Fall 2019 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley: Question: Hi Mark, We are wondering under your framework, how to deal with a situation when only right-censored data has a full set of covariates, while the covariates for the non-right-censored data are largely missing. To be specific, we want to find the relation between peoples’ matching property and their marriage durations.

## Estimating the sample average treatment effect under effect modification in a cluster randomized trial

This post is part of our Q&A series. A question from graduate students in our Fall 2019 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley: Question: Hi Mark, We were wondering about the application of TMLE and superlearner to cluster-randomized study designs, and the adoption of the sample average treatment effect (SATE) as an efficient estimator. From our understanding, although the SATE is not formally identifiable in a finite setting, it is nevertheless an efficient estimate due to its asymptotic behavior (TMLE for the population effect is asymptotically linear and has an asymptotically conservative variance estimator).

## Longitudinal causal model under obscured time-ordering

This post is part of our Q&A series. A question from graduate students in our Fall 2019 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley: Question: Hi Mark, Suppose we have a longitudinal data structure where information about the intervention and time-varying covariate is collected simultaneously, and their temporal ordering is obscured. For instance, data is collected at monthly health checkups, where $A(t)$ is the subject’s healthy eating habits in the past month, and $L(t)$ is the occurrence of heartburn in the past month.

## Positivity assumption violations and TMLE for longitudinal data with many time-varying covariates

This post is part of our Q&A series. A question from graduate students in our Fall 2019 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley: Question: Hi Mark, For longitudinal data such as $O=(L_0,A_0,Y_0,L_1,A_1,Y_1,L_2,A_2,Y_2,\ldots )$, we can use G-computation formula with sequential regression method if we treat time $t$ as discrete variable. And you also mentioned that there are more general methods which can deal with the case when $t$ is continuous.

## Simultaneous inference with the Kaplan-Meier estimator of survival

This post is part of our Q&A series. A question from graduate students in our Fall 2019 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley: Question: Hi Mark, First of all, I have doubts regarding the simultaneous confidence interval for Kaplan- Meier, since I am not necessarily interested in inference for a parameter. I would like > to know if the 95% confidence band for my KM estimator will hold using the same formula > we did in our R lab without covariates (taken from lectures).

## CV-TMLE and double machine learning

This post is part of our Q&A series. A question from Twitter on choosing between double machine learning and TMLE with cross-validation: https://twitter.com/emaadmanzoor/status/1208924841316880385 Question: @mark_vdlaan Is there an applied researcher’s guide to choosing between double machine learning and TMLE + cross-fitting? PS: Thanks for making these methods and resources so easily accessible! Answer: Thanks for this interesting question. In the past several years, the interest in these machine learning-based estimators has become more widespread, since they allow for the statistical answer to a question to be framed in terms of scientifically meaningful parameters (e.

## Prediction intervals using the TMLE framework

This post is part of our Q&A series. A question from graduate students in our Spring 2019 offering of “Targeted Learning in Biomedical Big Data” at Berkeley: Question: Hi Mark, We are curious about how to use TMLE and influence curves for estimation and inference when the target parameter is a conditional expectation, rather than a scalar. Specifically, suppose I have a data structure $O = (W, Y) \sim P_0$, and sample $n$ times i.

## Applications of TMLE in infectious disease research

This post is part of our Q&A series. A question from graduate students in our Spring 2019 offering of “Targeted Learning in Biomedical Big Data” at Berkeley: Question: Hi Mark, Thanks for teaching this class. It’s been an amazing experience. I have a few questions related to my own research. In infectious disease studies, modeling attempts to create models that estimate protection conferred from vaccination or previous history of infection (natural immunity).

## Adaptive algorithm selection via the Super Learner

This post is part of our Q&A series. A question from graduate students in our Spring 2019 offering of “Targeted Learning in Biomedical Big Data” at Berkeley: Question: Hi Mark, A couple questions I have are about super learning and the strength of the learners as well as potentially adaptively choosing learners. Is there any advantage, theoretical or practical, of having a large library of weaker learners over a small library of stronger learners?

## TMLE versus the one-step estimator

This post is part of our Q&A series. A question from graduate students in our Spring 2019 offering of “Targeted Learning in Biomedical Big Data” at Berkeley: Question: Hi Mark, Is there any theoretical guarantees about relative performances between TMLE and the one-step estimator in finite sample conditions? Thanks. H. R.d.B. Answer: Hi H. R.d.B., Finite sample guarantees are very hard to obtain. One can obtain finite-sample confidence intervals by, for example, not relying on a CLT but on finite-sample inequalities for sample means (e.