# The Research Group of Mark van der Laan

## Feature engineering with large datasets

This post is part of our Q&A series. A question from graduate students in our Spring 2021 offering of the new course “Targeted Learning in Practice” at UC Berkeley: Question: Hi Mark, In the article about why we need a statistical revolution at the beginning of the tlverse book, you discuss the “Art” of statistics, and describe a scenario where confounders for a logistic regression are chosen and characterized in such a way to yield a statistically significant result, potentially after multiple iterations that produce an estimate that is not significant.

## Super learning and interaction terms in models

This post is part of our Q&A series. A question from graduate students in our Spring 2021 offering of the new course “Targeted Learning in Practice” at UC Berkeley: Question: Hi Mark, I have a question about the step in the Super Learning framework where interaction terms can be added between certain covariates. Is there a principled way to decide what interactions terms should be added from the data alone, or do all interaction specifications have to be based on prior knowledge of the system in question?

## Adaptive designs with continuous treatments

This post is part of our Q&A series. A question from graduate students in our Spring 2021 offering of the new course “Targeted Learning in Practice” at UC Berkeley: Question: Hi Mark, You’ve got me thinking about selecting optimal experiments in the context of shift interventions. For the example we talked about in class, in order to avoid positivity violations, we define shift interventions such that an individual’s value of the intervention node $A$ is shifted by a specified amount $\delta$ unless there is no support for such a shift based on the covariates $W$, in which case $A$ is shifted to the maximum value available for that $W$.

## Estimating effects based upon community-level interventions and optimal interventions

This post is part of our Q&A series. A question from graduate students in our Fall 2020 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley: Question: Hi Mark, We have been discussing questions regarding community-based interventions and we would like to hear your input on the following three questions: When we estimate the causal effects of community-based interventions, we can use baseline variables to block the effect of the environment on the outcome, so that we can change the problem into individual levels.

## Using a data-adaptive target parameter and CV-TMLE in survival analysis

This post is part of our Q&A series. A question from graduate students in our Fall 2020 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley: Question: Hi Mark, Within the field of industrial hygiene and occupational epidemiology there is interest in linking possible occupational exposures to deleterious health outcomes, most usually various cancers. Obviously in such a setting, it is nearly impossible without individual chemical biomarkers to have causal identifiability for a specific exposure (for example lead, pesticides, benzene, etc.

## Using time-varying covariates in evaluating the causal effect of a single time point intervention

This post is part of our Q&A series. A question from graduate students in our Fall 2020 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley: Question: Hi Mark, We have an observational study with fixed baseline intervention, $A$ for statin use vs. no statin use, along with baseline covariates, $L$ such as age, gender, marital status, hypertension, diabetes, hypercholesterolemia, coronary artery disease. Our goal is to predict conversion to the more impaired stage of Alzheimer’s disease.

## TMLE for multi-level treatments and methods for sensitivity analysis

This post is part of our Q&A series. A question from graduate students in our Fall 2020 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley: Question: Hi Mark, We had two questions for you, 1. How to apply TMLE to treatment with multiple levels and conduct inference? For example, if the potential outcomes are $Y_i(0), Y_i(1), \ldots, Y_i(K)$ for $K$ different possible treatments, i.e., possible values for $A_i$ are from $1$ to $K$, how would TMLE work?

## Estimating causal effects with instrumental variables in survival analysis

This post is part of our Q&A series. A question from graduate students in our Fall 2020 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley: Question: Hi Mark, In survival analysis, what methods should we use to estimate counterfactuals and causal effect if the conditional independence assumption is violated? For instance, the instrumental variable method in econometrics and Mendelian randomization in biostatistics deal with the unmeasured confounding problem.

## Two-stage sampling and survival analysis

This post is part of our Q&A series. A question from graduate students in our Fall 2019 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley: Question: Hi Mark, We are wondering under your framework, how to deal with a situation when only right-censored data has a full set of covariates, while the covariates for the non-right-censored data are largely missing. To be specific, we want to find the relation between peoples’ matching property and their marriage durations.

## Estimating the sample average treatment effect under effect modification in a cluster randomized trial

This post is part of our Q&A series. A question from graduate students in our Fall 2019 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley: Question: Hi Mark, We were wondering about the application of TMLE and superlearner to cluster-randomized study designs, and the adoption of the sample average treatment effect (SATE) as an efficient estimator. From our understanding, although the SATE is not formally identifiable in a finite setting, it is nevertheless an efficient estimate due to its asymptotic behavior (TMLE for the population effect is asymptotically linear and has an asymptotically conservative variance estimator).