Q&A on The Research Group of Mark van der Laan
/tags/qa/
Recent content in Q&A on The Research Group of Mark van der Laan
Hugo  gohugo.io
enus
© 20182021. All rights reserved.
Wed, 30 Jun 2021 13:38:00 +0000

Causal inference with latent variables for unmeasured confounding
/2021/06/30/causalinferencewithlatentvariablesforunmeasuredconfounding/
Wed, 30 Jun 2021 13:38:00 +0000
/2021/06/30/causalinferencewithlatentvariablesforunmeasuredconfounding/
This post is part of our Q&A series.
A question from graduate students in our Spring 2021 offering of the new course “Targeted Learning in Practice” at UC Berkeley:
Question: Hi Mark,
Statistical analyses of highthroughput sequencing data are often made difficult due to the presence of unmeasured sources of technical and biological variation. Examples of potentially unmeasured technical factors are the time and date when individual samples were prepared for sequencing, as well as which lab personnel performed the experiment.

Causal mediation analysis for exposure mixtures
/2021/05/02/causalmediationanalysisforexposuremixtures/
Sun, 02 May 2021 12:52:00 +0000
/2021/05/02/causalmediationanalysisforexposuremixtures/
This post is part of our Q&A series.
A question from graduate students in our Spring 2021 offering of the new course “Targeted Learning in Practice” at UC Berkeley:
Question: Hi Mark,
After the lectures on tmle3shift and tmle3mediate, we’re wondering if a different procedure for mediation analysis could work. Consider a datagenerating system for $O = (W, A, Z, Y)$, where $W$ represents three binary covariates, $A$ is a binary exposure of interest, $Z$ are three binary mediators, and $Y$ is a continuous outcome.

Dataadaptively learning strataspecific causal effects
/2021/05/02/dataadaptivelylearningstrataspecificcausaleffects/
Sun, 02 May 2021 12:29:00 +0000
/2021/05/02/dataadaptivelylearningstrataspecificcausaleffects/
This post is part of our Q&A series.
A question from graduate students in our Spring 2021 offering of the new course “Targeted Learning in Practice” at UC Berkeley:
Question: Hi Mark,
I have a question about applying CVTMLE to a current research project. I have a crosssectional dataset from Bangladesh, where the outcome of interest is antenatal care use (binary), the exposure of interest is women’s empowerment (continuous), and the baseline covariates include mother’s age, child’s age, mother and father’s education, number of members in household, number of children under 15, household wealth, and maternal depression.

Tuning the highly adaptive lasso estimator
/2021/05/02/tuningthehighlyadaptivelassoestimator/
Sun, 02 May 2021 11:54:00 +0000
/2021/05/02/tuningthehighlyadaptivelassoestimator/
This post is part of our Q&A series.
A question from graduate students in our Spring 2021 offering of the new course “Targeted Learning in Practice” at UC Berkeley:
Question: Hi Mark,
In chapter 6 of your 2018 book Targeted Learning in Data Science, coauthored with Sherri Rose, you discuss the practical necessity of reducing the number of basis functions incorporated into the highly adaptive lasso (HAL) estimator when the number of covariates grows.

Applying targeted learning to improve global health equity
/2021/05/02/applyingtargetedlearningtoimproveglobalhealthequity/
Sun, 02 May 2021 11:15:00 +0000
/2021/05/02/applyingtargetedlearningtoimproveglobalhealthequity/
This post is part of our Q&A series.
A question from graduate students in our Spring 2021 offering of the new course “Targeted Learning in Practice” at UC Berkeley:
Question: Hi Mark,
A pressing issue in the field of global public health is equitable ownership of data and results in terms of both authorship and representation. In some aspects, targeted learning improves equity by bolstering our ability to efficiently draw causal inferences from global health data.

Machine learning for conditional density estimation
/2021/05/02/machinelearningforconditionaldensityestimation/
Sun, 02 May 2021 10:58:00 +0000
/2021/05/02/machinelearningforconditionaldensityestimation/
This post is part of our Q&A series.
A question from graduate students in our Spring 2021 offering of the new course “Targeted Learning in Practice” at UC Berkeley:
Question: Hi Mark,
I was curious in general about approaching problems that involve machine learningbased estimation of densities rather than scalar quantities (i.e., regression), particularly for continuous variables. As a grounding example, for continuous treatments in the TMLE framework one needs to estimate $P(A \mid W)$, where $A$ is a continuous random variable.

TMLE of a treatmentspecific multivariate survival curve
/2021/05/01/tmleofatreatmentspecificmultivariatesurvivalcurve/
Sat, 01 May 2021 17:54:00 +0000
/2021/05/01/tmleofatreatmentspecificmultivariatesurvivalcurve/
This post is part of our Q&A series.
A question from graduate students in our Spring 2021 offering of the new course “Targeted Learning in Practice” at UC Berkeley:
Question: Hi Mark,
I have a survival analysis question. I am working with a dataset that is left and righttruncated. I am interested in estimating the treatmentspecific multivariate survival function of a timetoevent variable. For example, a study where subjects have been randomized to two different treatment groups with baseline covariates $W$, but we only observe the outcome – time at death – for a left and righttruncated window.

Conditions for asymptotic efficiency of TMLE
/2021/05/01/conditionsforasymptoticefficiencyoftmle/
Sat, 01 May 2021 17:44:00 +0000
/2021/05/01/conditionsforasymptoticefficiencyoftmle/
This post is part of our Q&A series.
A question from graduate students in our Spring 2021 offering of the new course “Targeted Learning in Practice” at UC Berkeley:
Question: Hi Mark,
I have a question regarding the requirements for asymptotic efficiency of TMLE.
Asymptotic efficiency of TMLE relies on the secondorder remainder being negligible. Is this purely a finitesample concern, or are there potentially parameters of interest where this isn’t true by construction?

Causal inference with leftcensoring and lefttruncation
/2021/05/01/causalinferencewithleftcensoringandlefttruncation/
Sat, 01 May 2021 17:23:00 +0000
/2021/05/01/causalinferencewithleftcensoringandlefttruncation/
This post is part of our Q&A series.
A question from graduate students in our Spring 2021 offering of the new course “Targeted Learning in Practice” at UC Berkeley:
Question: Hi Mark,
As epidemiologists, we wish to study the relationship between timevarying exposure and disease progression over time. A natural choice of study design would be the longitudinal cohort study. In prospective cohorts, participants are not selected from existing data, but enrolled during some enrollment period.

Stochastic treatment regimes in practice
/2021/05/01/stochastictreatmentregimesinpractice/
Sat, 01 May 2021 15:54:00 +0000
/2021/05/01/stochastictreatmentregimesinpractice/
This post is part of our Q&A series.
A question from graduate students in our Spring 2021 offering of the new course “Targeted Learning in Practice” at UC Berkeley:
Question: Hi Mark,
We were discussing practical implementations of stochastic treatment regimes and came up with the following questions we would like to hear your thoughts about.
Question 1 (Practical positivity): Is there a recommended procedure for deciding truncation threshold with respect to shifts in the framework of stochastic treatment regimes?

Feature engineering with large datasets
/2021/05/01/featureengineeringwithlargedatasets/
Sat, 01 May 2021 15:28:00 +0000
/2021/05/01/featureengineeringwithlargedatasets/
This post is part of our Q&A series.
A question from graduate students in our Spring 2021 offering of the new course “Targeted Learning in Practice” at UC Berkeley:
Question: Hi Mark,
In the article about why we need a statistical revolution at the beginning of the tlverse book, you discuss the “Art” of statistics, and describe a scenario where confounders for a logistic regression are chosen and characterized in such a way to yield a statistically significant result, potentially after multiple iterations that produce an estimate that is not significant.

Super learning and interaction terms in models
/2021/05/01/superlearningandinteractiontermsinmodels/
Sat, 01 May 2021 15:13:00 +0000
/2021/05/01/superlearningandinteractiontermsinmodels/
This post is part of our Q&A series.
A question from graduate students in our Spring 2021 offering of the new course “Targeted Learning in Practice” at UC Berkeley:
Question: Hi Mark,
I have a question about the step in the Super Learning framework where interaction terms can be added between certain covariates. Is there a principled way to decide what interactions terms should be added from the data alone, or do all interaction specifications have to be based on prior knowledge of the system in question?

Adaptive designs with continuous treatments
/2021/05/01/adaptivedesignswithcontinuoustreatments/
Sat, 01 May 2021 15:00:00 +0000
/2021/05/01/adaptivedesignswithcontinuoustreatments/
This post is part of our Q&A series.
A question from graduate students in our Spring 2021 offering of the new course “Targeted Learning in Practice” at UC Berkeley:
Question: Hi Mark,
You’ve got me thinking about selecting optimal experiments in the context of shift interventions. For the example we talked about in class, in order to avoid positivity violations, we define shift interventions such that an individual’s value of the intervention node $A$ is shifted by a specified amount $\delta$ unless there is no support for such a shift based on the covariates $W$, in which case $A$ is shifted to the maximum value available for that $W$.

Estimating effects based upon communitylevel interventions and optimal interventions
/2020/12/04/estimatingeffectsbaseduponcommunitylevelinterventionsandoptimalinterventions/
Fri, 04 Dec 2020 17:53:00 +0000
/2020/12/04/estimatingeffectsbaseduponcommunitylevelinterventionsandoptimalinterventions/
This post is part of our Q&A series.
A question from graduate students in our Fall 2020 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley:
Question: Hi Mark,
We have been discussing questions regarding communitybased interventions and we would like to hear your input on the following three questions:
When we estimate the causal effects of communitybased interventions, we can use baseline variables to block the effect of the environment on the outcome, so that we can change the problem into individual levels.

Using a dataadaptive target parameter and CVTMLE in survival analysis
/2020/12/04/usingadataadaptivetargetparameterandcvtmleinsurvivalanalysis/
Fri, 04 Dec 2020 16:51:00 +0000
/2020/12/04/usingadataadaptivetargetparameterandcvtmleinsurvivalanalysis/
This post is part of our Q&A series.
A question from graduate students in our Fall 2020 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley:
Question: Hi Mark,
Within the field of industrial hygiene and occupational epidemiology there is interest in linking possible occupational exposures to deleterious health outcomes, most usually various cancers. Obviously in such a setting, it is nearly impossible without individual chemical biomarkers to have causal identifiability for a specific exposure (for example lead, pesticides, benzene, etc.

Using timevarying covariates in evaluating the causal effect of a single time point intervention
/2020/12/03/usingtimevaryingcovariatesinevaluatingthecausaleffectofasingletimepointintervention/
Thu, 03 Dec 2020 17:48:00 +0000
/2020/12/03/usingtimevaryingcovariatesinevaluatingthecausaleffectofasingletimepointintervention/
This post is part of our Q&A series.
A question from graduate students in our Fall 2020 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley:
Question: Hi Mark,
We have an observational study with fixed baseline intervention, $A$ for statin use vs. no statin use, along with baseline covariates, $L$ such as age, gender, marital status, hypertension, diabetes, hypercholesterolemia, coronary artery disease. Our goal is to predict conversion to the more impaired stage of Alzheimer’s disease.

TMLE for multilevel treatments and methods for sensitivity analysis
/2020/12/03/tmleformultileveltreatmentsandmethodsforsensitivityanalysis/
Thu, 03 Dec 2020 17:41:00 +0000
/2020/12/03/tmleformultileveltreatmentsandmethodsforsensitivityanalysis/
This post is part of our Q&A series.
A question from graduate students in our Fall 2020 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley:
Question: Hi Mark,
We had two questions for you, 1. How to apply TMLE to treatment with multiple levels and conduct inference? For example, if the potential outcomes are $Y_i(0), Y_i(1), \ldots, Y_i(K)$ for $K$ different possible treatments, i.e., possible values for $A_i$ are from $1$ to $K$, how would TMLE work?

Estimating causal effects with instrumental variables in survival analysis
/2020/12/03/estimatingcausaleffectswithinstrumentalvariablesinsurvivalanalysis/
Thu, 03 Dec 2020 17:27:00 +0000
/2020/12/03/estimatingcausaleffectswithinstrumentalvariablesinsurvivalanalysis/
This post is part of our Q&A series.
A question from graduate students in our Fall 2020 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley:
Question: Hi Mark,
In survival analysis, what methods should we use to estimate counterfactuals and causal effect if the conditional independence assumption is violated? For instance, the instrumental variable method in econometrics and Mendelian randomization in biostatistics deal with the unmeasured confounding problem.

Twostage sampling and survival analysis
/2019/12/30/twostagesamplingandsurvivalanalysis/
Mon, 30 Dec 2019 13:30:00 +0000
/2019/12/30/twostagesamplingandsurvivalanalysis/
This post is part of our Q&A series.
A question from graduate students in our Fall 2019 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley:
Question: Hi Mark,
We are wondering under your framework, how to deal with a situation when only rightcensored data has a full set of covariates, while the covariates for the nonrightcensored data are largely missing. To be specific, we want to find the relation between peoples’ matching property and their marriage durations.

Estimating the sample average treatment effect under effect modification in a cluster randomized trial
/2019/12/29/estimatingthesampleaveragetreatmenteffectundereffectmodificationinaclusterrandomizedtrial/
Sun, 29 Dec 2019 17:30:00 +0000
/2019/12/29/estimatingthesampleaveragetreatmenteffectundereffectmodificationinaclusterrandomizedtrial/
This post is part of our Q&A series.
A question from graduate students in our Fall 2019 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley:
Question: Hi Mark,
We were wondering about the application of TMLE and superlearner to clusterrandomized study designs, and the adoption of the sample average treatment effect (SATE) as an efficient estimator. From our understanding, although the SATE is not formally identifiable in a finite setting, it is nevertheless an efficient estimate due to its asymptotic behavior (TMLE for the population effect is asymptotically linear and has an asymptotically conservative variance estimator).

Longitudinal causal model under obscured timeordering
/2019/12/29/longitudinalcausalmodelunderobscuredtimeordering/
Sun, 29 Dec 2019 17:30:00 +0000
/2019/12/29/longitudinalcausalmodelunderobscuredtimeordering/
This post is part of our Q&A series.
A question from graduate students in our Fall 2019 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley:
Question: Hi Mark,
Suppose we have a longitudinal data structure where information about the intervention and timevarying covariate is collected simultaneously, and their temporal ordering is obscured. For instance, data is collected at monthly health checkups, where $A(t)$ is the subject’s healthy eating habits in the past month, and $L(t)$ is the occurrence of heartburn in the past month.

Positivity assumption violations and TMLE for longitudinal data with many timevarying covariates
/2019/12/29/positivityassumptionviolationsandtmleforlongitudinaldatawithmanytimevaryingcovariates/
Sun, 29 Dec 2019 17:30:00 +0000
/2019/12/29/positivityassumptionviolationsandtmleforlongitudinaldatawithmanytimevaryingcovariates/
This post is part of our Q&A series.
A question from graduate students in our Fall 2019 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley:
Question: Hi Mark,
For longitudinal data such as $O=(L_0,A_0,Y_0,L_1,A_1,Y_1,L_2,A_2,Y_2,\ldots )$, we can use Gcomputation formula with sequential regression method if we treat time $t$ as discrete variable. And you also mentioned that there are more general methods which can deal with the case when $t$ is continuous.

Simultaneous inference with the KaplanMeier estimator of survival
/2019/12/29/simultaneousinferencewiththekaplanmeierestimatorofsurvival/
Sun, 29 Dec 2019 17:30:00 +0000
/2019/12/29/simultaneousinferencewiththekaplanmeierestimatorofsurvival/
This post is part of our Q&A series.
A question from graduate students in our Fall 2019 offering of “Biostatistical Methods: Survival Analysis and Causality” at UC Berkeley:
Question: Hi Mark,
First of all, I have doubts regarding the simultaneous confidence interval for Kaplan Meier, since I am not necessarily interested in inference for a parameter. I would like > to know if the 95% confidence band for my KM estimator will hold using the same formula > we did in our R lab without covariates (taken from lectures).

CVTMLE and double machine learning
/2019/12/24/cvtmleanddoublemachinelearning/
Tue, 24 Dec 2019 12:43:00 +0000
/2019/12/24/cvtmleanddoublemachinelearning/
This post is part of our Q&A series.
A question from Twitter on choosing between double machine learning and TMLE with crossvalidation: https://twitter.com/emaadmanzoor/status/1208924841316880385
Question: @mark_vdlaan Is there an applied researcher’s guide to choosing between double machine learning and TMLE + crossfitting? PS: Thanks for making these methods and resources so easily accessible!
Answer: Thanks for this interesting question. In the past several years, the interest in these machine learningbased estimators has become more widespread, since they allow for the statistical answer to a question to be framed in terms of scientifically meaningful parameters (e.

Prediction intervals using the TMLE framework
/2019/05/11/predictionintervalsusingthetmleframework/
Sat, 11 May 2019 16:24:00 +0000
/2019/05/11/predictionintervalsusingthetmleframework/
This post is part of our Q&A series.
A question from graduate students in our Spring 2019 offering of “Targeted Learning in Biomedical Big Data” at Berkeley:
Question: Hi Mark,
We are curious about how to use TMLE and influence curves for estimation and inference when the target parameter is a conditional expectation, rather than a scalar.
Specifically, suppose I have a data structure $O = (W, Y) \sim P_0$, and sample $n$ times i.

Applications of TMLE in infectious disease research
/2019/05/11/applicationsoftmleininfectiousdiseaseresearch/
Sat, 11 May 2019 14:35:00 +0000
/2019/05/11/applicationsoftmleininfectiousdiseaseresearch/
This post is part of our Q&A series.
A question from graduate students in our Spring 2019 offering of “Targeted Learning in Biomedical Big Data” at Berkeley:
Question: Hi Mark,
Thanks for teaching this class. It’s been an amazing experience. I have a few questions related to my own research.
In infectious disease studies, modeling attempts to create models that estimate protection conferred from vaccination or previous history of infection (natural immunity).

Adaptive algorithm selection via the Super Learner
/2019/05/11/adaptivealgorithmselectionviathesuperlearner/
Sat, 11 May 2019 13:54:00 +0000
/2019/05/11/adaptivealgorithmselectionviathesuperlearner/
This post is part of our Q&A series.
A question from graduate students in our Spring 2019 offering of “Targeted Learning in Biomedical Big Data” at Berkeley:
Question: Hi Mark,
A couple questions I have are about super learning and the strength of the learners as well as potentially adaptively choosing learners. Is there any advantage, theoretical or practical, of having a large library of weaker learners over a small library of stronger learners?

TMLE versus the onestep estimator
/2019/05/10/tmleversustheonestepestimator/
Fri, 10 May 2019 19:23:00 +0000
/2019/05/10/tmleversustheonestepestimator/
This post is part of our Q&A series.
A question from graduate students in our Spring 2019 offering of “Targeted Learning in Biomedical Big Data” at Berkeley:
Question: Hi Mark,
Is there any theoretical guarantees about relative performances between TMLE and the onestep estimator in finite sample conditions?
Thanks,
H.R.B.
Answer: Hi H.R.B.,
Finite sample guarantees are very hard to obtain. One can obtain finitesample confidence intervals by, for example, not relying on a CLT but on finitesample inequalities for sample means (e.

Imputation and missing data in the TMLE framework
/2019/05/10/imputationandmissingdatainthetmleframework/
Fri, 10 May 2019 10:01:00 +0000
/2019/05/10/imputationandmissingdatainthetmleframework/
This post is part of our Q&A series.
A question from graduate students in our Spring 2019 offering of “Targeted Learning in Biomedical Big Data” at Berkeley:
Question: Hi Mark,
For a longitudinal data set if we have missing data, we might want to impute the values with MICE imputation (multiple imputation with chain equations). Can we use TMLE together with multiple imputation? How can we combine the results of all the multiple imputed datasets into a final result and obtain valid inference?

Adaptive designs and optimal subgroups
/2018/12/01/adaptivedesignsandoptimalsubgroups/
Sat, 01 Dec 2018 17:16:00 +0000
/2018/12/01/adaptivedesignsandoptimalsubgroups/
This post is part of our Q&A series.
A question from graduate students in our Fall 2018 offering of “Special Topics in Biostatistics – Adaptive Designs” at Berkeley:
Question: Hi Mark,
We were interested in your opinion on few topics that have come up in class a few times.
If we isolate an optimal subgroup, we can, perhaps, answer interesting questions about, say, drug efficacy (as in, does this drug work for anybody as opposed to on average?

Adaptive sequential designs and optimal treatments
/2018/11/29/adaptivesequentialdesignsandoptimaltreatments/
Thu, 29 Nov 2018 12:43:00 +0000
/2018/11/29/adaptivesequentialdesignsandoptimaltreatments/
This post is part of our Q&A series.
A question from graduate students in our Fall 2018 offering of “Special Topics in Biostatistics – Adaptive Designs” at Berkeley:
Question: Hi Mark,
Our question concerns the benefit of using a sequential adaptive design when estimating the outcome under the optimal dynamic treatment rule (for a binary treatment). We propose doing so in a 2stage framework, where in the first stage subjects are naively randomized to treatment, $Pr(A=1) = 0.

Causal effects for singlegroup policies
/2018/11/28/causaleffectsforsinglegrouppolicies/
Wed, 28 Nov 2018 14:14:00 +0000
/2018/11/28/causaleffectsforsinglegrouppolicies/
This post is part of our Q&A series.
A question from a graduate student in our Spring 2018 offering of “Targeted Learning in Biomedical Big Data” at Berkeley:
Question: Hi Mark,
I was thinking that if you addressed the question that [we] discussed in your office hours last week, a lot of economists would be interested in reading it.
Feel free to edit the wording of the question however suits you best, but I was thinking: How can you formulate a causal parameter in a setting in which you have a policy that affects one group but not another based on observable characteristics and control for time trends in your model (i.

Finitesample properties of TML estimators
/2018/01/11/finitesamplepropertiesoftmlestimators/
Thu, 11 Jan 2018 17:19:00 +0000
/2018/01/11/finitesamplepropertiesoftmlestimators/
This post is part of our Q&A series.
A question from a graduate student in our Fall 2017 offering of “Survival Analysis and Causality” at Berkeley:
Question: Hi Mark,
This may be an illdefined question, but I was wondering, in the usual $O = (W, A, Y)$ setup, while TMLE has superior asymptotic properties over competing estimators like, say, the Gcomputation plugin estimator or the IPTW estimator, are there specific instances where it is also guaranteed to have superior finite sample properties as well?

Competing risks and nonpathwise differentiable parameters
/2017/11/29/competingrisksandnonpathwisedifferentiableparameters/
Wed, 29 Nov 2017 11:30:00 +0000
/2017/11/29/competingrisksandnonpathwisedifferentiableparameters/
This post is part of our Q&A series.
A question from two graduate students in our Fall 2017 offering of “Survival Analysis and Causality” at Berkeley:
Question: Hi Mark,
Below are [two] questions [we thought might interest you]. Looking forward to your thoughts on these!
Best,
S.D. and I.M.
Most competing risk analyses assume that the competing risks are independent of one another. What would be your advice on handling the same style of survival data when the occurrence of one of the competing events is informative of the occurrence of the other?

Leavepout Crossvalidation
/2017/11/22/leavepoutcrossvalidation/
Wed, 22 Nov 2017 15:40:00 +0000
/2017/11/22/leavepoutcrossvalidation/
This post is part of our Q&A series.
A question from two graduate students in our Fall 2017 offering of “Survival Analysis and Causality” at Berkeley:
Question: Hi Mark,
[We] were wondering what the implications were for selecting leave one observation out versus leave one cluster out when performing crossvalidation on a longitudinal data structure. We understand that computational constraints may render leaveoneout crossvalidation to be undesirable, however are we implicitly biasing our model selection by our choice in crossvalidation technique?