The Research Group of Mark van der Laan
/
Recent content on The Research Group of Mark van der Laan
Hugo  gohugo.io
enus
© 20182019. All rights reserved.
Sat, 11 May 2019 16:24:00 +0000

Prediction intervals using the TMLE framework
/2019/05/11/predictionintervalsusingthetmleframework/
Sat, 11 May 2019 16:24:00 +0000
/2019/05/11/predictionintervalsusingthetmleframework/
This post is part of our Q&A series.
A question from graduate students in our Spring 2019 offering of “Targeted Learning in Biomedical Big Data” at Berkeley:
Question:
Hi Mark,
We are curious about how to use TMLE and influence curves for estimation and inference when the target parameter is a conditional expectation, rather than a scalar.
Specifically, suppose I have a data structure $O = (W, Y) \sim P_0$, and sample $n$ times i.

Applications of TMLE in infectious disease research
/2019/05/11/applicationsoftmleininfectiousdiseaseresearch/
Sat, 11 May 2019 14:35:00 +0000
/2019/05/11/applicationsoftmleininfectiousdiseaseresearch/
This post is part of our Q&A series.
A question from graduate students in our Spring 2019 offering of “Targeted Learning in Biomedical Big Data” at Berkeley:
Question:
Hi Mark,
Thanks for teaching this class. It’s been an amazing experience. I have a few questions related to my own research.
In infectious disease studies, modeling attempts to create models that estimate protection conferred from vaccination or previous history of infection (natural immunity).

Adaptive algorithm selection via the Super Learner
/2019/05/11/adaptivealgorithmselectionviathesuperlearner/
Sat, 11 May 2019 13:54:00 +0000
/2019/05/11/adaptivealgorithmselectionviathesuperlearner/
This post is part of our Q&A series.
A question from graduate students in our Spring 2019 offering of “Targeted Learning in Biomedical Big Data” at Berkeley:
Question:
Hi Mark,
A couple questions I have are about super learning and the strength of the learners as well as potentially adaptively choosing learners. Is there any advantage, theoretical or practical, of having a large library of weaker learners over a small library of stronger learners?

TMLE versus onestep estimator
/2019/05/10/tmleversusonestepestimator/
Fri, 10 May 2019 19:23:00 +0000
/2019/05/10/tmleversusonestepestimator/
This post is part of our Q&A series.
A question from graduate students in our Spring 2019 offering of “Targeted Learning in Biomedical Big Data” at Berkeley:
Question:
Hi Mark,
Is there any theoretical guarantees about relative performances between TMLE and the onestep estimator in finite sample conditions?
Thanks.
H. R.d.B.
Answer:
Hi H. R.d.B.,
Finite sample guarantees are very hard to obtain. One can obtain finitesample confidence intervals by, for example, not relying on a CLT but on finitesample inequalities for sample means (e.

Imputation and missing data in the TMLE framework
/2019/05/10/imputationandmissingdatainthetmleframework/
Fri, 10 May 2019 10:01:00 +0000
/2019/05/10/imputationandmissingdatainthetmleframework/
This post is part of our Q&A series.
A question from graduate students in our Spring 2019 offering of “Targeted Learning in Biomedical Big Data” at Berkeley:
Question:
Hi Mark,
For a longitudinal data set if we have missing data, we might want to impute the values with MICE imputation (multiple imputation with chain equations). Can we use TMLE together with multiple imputation? How can we combine the results of all the multiple imputed datasets into a final result and obtain valid inference?

Adaptive designs and optimal subgroups
/2018/12/01/adaptivedesignsandoptimalsubgroups/
Sat, 01 Dec 2018 17:16:00 +0000
/2018/12/01/adaptivedesignsandoptimalsubgroups/
This post is part of our Q&A series.
A question from graduate students in our Fall 2018 offering of “Special Topics in Biostatistics – Adaptive Designs” at Berkeley:
Question:
Hi Mark,
We were interested in your opinion on few topics that have come up in class a few times.
If we isolate an optimal subgroup, we can, perhaps, answer interesting questions about, say, drug efficacy (as in, does this drug work for anybody as opposed to on average?

Adaptive sequential designs and optimal treatments
/2018/11/29/adaptivesequentialdesignsandoptimaltreatments/
Thu, 29 Nov 2018 12:43:00 +0000
/2018/11/29/adaptivesequentialdesignsandoptimaltreatments/
This post is part of our Q&A series.
A question from graduate students in our Fall 2018 offering of “Special Topics in Biostatistics – Adaptive Designs” at Berkeley:
Question:
Hi Mark,
Our question concerns the benefit of using a sequential adaptive design when estimating the outcome under the optimal dynamic treatment rule (for a binary treatment). We propose doing so in a 2stage framework, where in the first stage subjects are naively randomized to treatment, $Pr(A=1) = 0.

Causal effects for singlegroup policies
/2018/11/28/causaleffectsforsinglegrouppolicies/
Wed, 28 Nov 2018 14:14:00 +0000
/2018/11/28/causaleffectsforsinglegrouppolicies/
This post is part of our Q&A series.
A question from a graduate student in our Spring 2018 offering of “Targeted Learning in Biomedical Big Data” at Berkeley:
Question:
Hi Mark,
I was thinking that if you addressed the question that [we] discussed in your office hours last week, a lot of economists would be interested in reading it.
Feel free to edit the wording of the question however suits you best, but I was thinking: How can you formulate a causal parameter in a setting in which you have a policy that affects one group but not another based on observable characteristics and control for time trends in your model (i.

Finite Sample Properties of TML Estimators
/2018/01/11/finitesamplepropertiesoftmlestimators/
Thu, 11 Jan 2018 17:19:00 +0000
/2018/01/11/finitesamplepropertiesoftmlestimators/
This post is part of our Q&A series.
A question from a graduate student in our Fall 2017 offering of “Survival Analysis and Causality” at Berkeley:
Question:
Hi Mark,
This may be an illdefined question, but I was wondering, in the usual $O = (W, A, Y)$ setup, while TMLE has superior asymptotic properties over competing estimators like, say, the Gcomputation plugin estimator or the IPTW estimator, are there specific instances where it is also guaranteed to have superior finite sample properties as well?

Competing Risks and Nonpathwise Differentiable Parameters
/2017/11/29/competingrisksandnonpathwisedifferentiableparameters/
Wed, 29 Nov 2017 11:30:00 +0000
/2017/11/29/competingrisksandnonpathwisedifferentiableparameters/
This post is part of our Q&A series.
A question from two graduate students in our Fall 2017 offering of “Survival Analysis and Causality” at Berkeley:
Question:
Hi Mark,
Below are [two] questions [we thought might interest you]. Looking forward to your thoughts on these!
Best, S.D. and I.M.
Most competing risk analyses assume that the competing risks are independent of one another. What would be your advice on handling the same style of survival data when the occurrence of one of the competing events is informative of the occurrence of the other?

Leavepout Crossvalidation
/2017/11/22/leavepoutcrossvalidation/
Wed, 22 Nov 2017 15:40:00 +0000
/2017/11/22/leavepoutcrossvalidation/
This post is part of our Q&A series.
A question from two graduate students in our Fall 2017 offering of “Survival Analysis and Causality” at Berkeley:
Question:
Hi Mark,
[We] were wondering what the implications were for selecting leave one observation out versus leave one cluster out when performing crossvalidation on a longitudinal data structure. We understand that computational constraints may render leaveoneout crossvalidation to be undesirable, however are we implicitly biasing our model selection by our choice in crossvalidation technique?

Lab Members and Alumni
/students/
Mon, 23 Oct 2017 00:00:00 +0000
/students/
Current graduate students: Aurelien Florent Bibaut
Wilson (Weixin) Cai
Mary Combs
Nima Hejazi
Cheng Ju
Jonathan M. Levy
Ivana Malenica
Chi Zhang
Postdocs: 20162018: Oleg Sofrygin, Postdoctoral Fellow
20152018: Caleb Miles, Postdoctoral Fellow
20142018: Kara Rudolph, Postdoctoral Fellow
20162017: David Benkeser, Postdoctoral Fellow
20142017: Wenjing Zheng, Postdoctoral Fellow
20112014: Marco Carone, Postdoctoral Fellow
20072010: Hui Wang, Postdoctoral Fellow
2007: G. Reevens, Free University of Amsterdam (visiting Ph.D student)
20042007: Michael Rosenblum, Postdoctoral Fellow

Welcome to Mark's Blog
/2017/10/22/welcometomarksblog/
Sun, 22 Oct 2017 00:00:00 +0000
/2017/10/22/welcometomarksblog/
Welcome! This is the research blog of Mark van der Laan.
Over the last few years, communication in science has evolved; indeed, many exciting and inspiring researchrelated ideas are now first communicated informally, with blog posts and the like, before formal publication in academic journals. Blog posts provide an excellent medium through which interesting ideas can be communicated quickly and concisely. We plan to use this blog to share ideas, tips, and examples from our research – and to establish an open dialogue with researchers around the world.

Censored Data and Causal Inference
/causality/
Fri, 23 Dec 2016 00:00:00 +0000
/causality/
Censored Data and Causality Projects: Efficient, Double Robust Estimation in a Weight Loss Study.
(with Daniel Rubin and Nick Jewell)
Study on the Consequences of the Protease Inhibitor Era (SCOPE).
(with Steve Deeks, Jeff Martin, Art Reingold, and Maya Peterson)
Data Adaptive Causal inference for TimeIndependent Treatment based on longitudinal data.
(with Ira Tager and Romain Neugebauer))
A causal inference approach for constructing transcriptional regulatory networks.
(with Biao Xing)

Computational Biology
/compbio/
Fri, 23 Dec 2016 00:00:00 +0000
/compbio/
Statistical methods for detecting structured regulatory motifs in DNA sequences.
(with Sandrine Dudoit, Sunduz Keles, Michael Eisen [Lawrence Berkeley National Lab], Biao Xing)
Statistical methods for constructing transcriptional regulatory networks.
(with Biao Xing)
Statistical Inference with marginal geneexpression data
(with Jennifer Bryan, Katie Pollard, Chiron scientists, Annette Molinaro)
Statistical analysis of (breast) cancer data bases with gene expression data
(with Annette Molinaro, Katie Pollard, Alan Hubbard, Jennifer Bryan)
Analysis of databases containing gene expression data on an organism, the complete genome of the organisms, and biological knowledge on each of the genes

Contact
/contact/
Fri, 23 Dec 2016 00:00:00 +0000
/contact/
Mark J. van der Laan University of California at Berkeley Division of Biostatistics School of Public Health Earl Warren Hall #7360 Berkeley, California 947207360 email: laan@stat.berkeley.edu tel: 510.643.9866 fax: 510.643.5163 You may also contact the Division of Biostatistics.

Multiple Hypothesis Testing
/multtest/
Fri, 23 Dec 2016 00:00:00 +0000
/multtest/
Multiple testing methods are hypothesis testing procedures designed to simultaneously test a family of null hypotheses while controlling an error rate. We have described a general statistical framework for multiple hypothesis testing, in which we define error rate control in terms of the true underlying data generating distribution. We have shown that the correct null distribution for the test statistics is obtained by projecting their true distribution onto the space of mean zero distributions.

News
/news/
Fri, 23 Dec 2016 00:00:00 +0000
/news/
Together with Judea Pearl, Jasjeet Sekhon, and Maya Petersen, I am pleased to announce the launch of the Journal of Causal Inference  a new journal that publishes papers on theoretical and applied causal research across the range of academic disciplines that use quantitative tools to study causality. Our first issue is planned for Fall 2011 and our website is now open for submissions.
Journal of Causal Inference
Together with Alan Hubbard, Michael Jordan, and Rasmus Nielsen, I am leading the NIHfunded Biomedical Big Data Training Program at UC Berkeley.

Projects and Grants
/proj/
Fri, 23 Dec 2016 00:00:00 +0000
/proj/
Below is a list of current projects. Please also see the students’ pages for additional projects. Previous projects can also be found through the following links: censored data and causal inference, computational biology, dataadaptive learning, and multiple hypothesis testing.
Current Projects Toxic Substances in the Environment: Quantitative Biology: Biostatistics, Bioinformatics, and Computation
NIEHS
P.I. Martyn Smith, PhD We provide investigators with consultative support in biostatistics, computational biology, and bioinformatics and to support webbased dissemination of bioinformatics solutions and database access.

Research
/research/
Fri, 23 Dec 2016 00:00:00 +0000
/research/
Mark van der Laan’s main research interests may be viewed here.
Statement of purpose Current statistical practice typically involves application of parametric models, even though everybody agrees that these parametric models are wrong. That is, they agree that one somehow needs to interpret the fitted coefficients in this parametric model when it is known that the parametric model is misspecified. Moreover, they accept these wrong methods, even though these are guaranteed to result in a biased estimate of the target parameter they had in mind when applying these parametric models, and, consequently, biased confidence intervals and pvalues.

Research Interests
/vdlresearch/
Fri, 23 Dec 2016 00:00:00 +0000
/vdlresearch/
Mark van der Laan’s main research interests are:
1) Developing optimal statistical methodology and theory for analyzing high dimensional and complex data sets, involving censoring, missingness, and biased sampling, under realistic assumptions resulting in semiparametric models,
2) Causal Inference in longitudinal observational studies and randomized controlled trials with possible informative treatment assignment and informative censoring, and
3) Statistical Methods in Genomics (i.e., Computational Biology, Machine Learning), a field made possible by advances in technology that have enabled accurate, lowcost, genomewide monitoring of mRNAs, DNA’s, proteins and other important biomolecules in cells throughout an organism, over time and space.

Selected Work
/papers/
Fri, 23 Dec 2016 00:00:00 +0000
/papers/
Mark’s and his students’ papers and publications can be found on the bepress website. For recent publications, please refer to bepress website. Important papers can be found in the “Readings in Targeted Maximum Likelihood Estimation” on the bepress website.
Recent editorial in Amstat News by Mark van der Laan & Sherri Rose: “Statistics Ready for a Revolution”
Read the recent editorial in STATS.org by Mark van der Laan: “Why We Need a Statistical Revolution”.

Software
/software/
Fri, 23 Dec 2016 00:00:00 +0000
/software/
The van der Laan group contributes stateoftheart software for Targeted Learning over a wide range of platforms, primarily using the R language and environment for statistical computing but also occasionally in programming languages including Python, Julia, SAS, C++, and Java.
The tlverse software ecosystem is a centralized effort to overhaul the state and availability of Targeted Learning software in R. For a more general (but slightly dated) set of opensource software packages, consider checking out the UC Berkeley Biostatistics Software Community.

Statement of Purpose
/sop/
Fri, 23 Dec 2016 00:00:00 +0000
/sop/
Current statistical practice typically involves application of parametric models, even though everybody agrees that these parametric models are wrong. That is, they agree that one somehow needs to interpret the fitted coefficients in this parametric model when it is known that the parametric model is misspecified. Moreover, they accept these wrong methods, even though these are guaranteed to result in a biased estimate of the target parameter they had in mind when applying these parametric models, and, consequently, biased confidence intervals and pvalues.

Teaching
/teaching/
Fri, 23 Dec 2016 00:00:00 +0000
/teaching/
Recent Courses PBHLTH C240A / STAT C245A: Introduction to Modern Biostatistical Theory and Practice
Most recently taught in Spring 2017 PBHLTH C240B / STAT C245B: Biostatistical Methods: Survival Analysis and Causality
Most recently taught in Fall 2017 [Syllabus (PDF)] PBHLTH 243A: Special Topics in Biostatistics: Multivariate Statistical Methods in Genomics
PBHLTH 243D: Special Topics in Biostatistics: Adaptive Designs
PBHLTH C246A / STAT C249A: Censored Longitudinal Data and Causality

Unified Data Adaptive Learning
/crossval/
Fri, 23 Dec 2016 00:00:00 +0000
/crossval/
We have developed a unified loss based methodology for data adaptive estimation/learning of any parameter, including regression and density estimation as special cases, based on a sample of i.i.d. observations. The parameter of interest is defined as the minimizer of an expectation of a loss function of the experimental unit, a candidate parameter, and possibly a nuisance parameter. By allowing the loss function to depend on a nuisance parameter such a loss function can be constructed for any parameter (finite or infinite dimensional).

About
/about/
Wed, 09 Mar 2016 00:00:00 +0000
/about/
Mark van der Laan, Ph.D., is a Professor of Biostatistics and Statistics at UC Berkeley. His research interests include statistical methods in genomics (i.e., computational biology), survival analysis, censored data, targeted maximum likelihood estimation in semiparametric models, causal inference, data adaptive lossbased super learning, and multiple testing. Further details on Mark’s research interests are available here.
His research group developed lossbased super learning in semiparametric models, based on crossvalidation, as a generic optimal tool for estimation of infinite dimensional parameters, such as nonparametric density estimation and prediction based on censored and uncensored data.