Multiple Hypothesis Testing

Projects on Multiple Hypothesis Testing

Multiple testing methods are hypothesis testing procedures designed to simultaneously test a family of null hypotheses while controlling an error rate. We have described a general statistical framework for multiple hypothesis testing, in which we define error rate control in terms of the true underlying data generating distribution. We have shown that the correct null distribution for the test statistics is obtained by projecting their true distribution onto the space of mean zero distributions. For common choices of test statistics (based on an asymptotically linear parameter estimator), this distribution is asymptotically multivariate normal with mean zero and the covariance of the vector influence curve for the parameter estimator.

This test statistic null distribution can be estimated by applying the non-parametric or parametric bootstrap to correctly centered test statistics. We have proven that this bootstrap estimated null distribution provides asymptotic strong control of most type I error rates. We have shown that obtaining a test statistic null distribution from a data null distribution only provides the correct test statistic null distribution if the covariance of the vector influence curve is the same under the data null distribution as under the true data distribution. This condition is the formal analogue of the subset pivotality condition (Westfall and Young, 1993). We have also shown that our multiple testing methodology controlling type I error is equivalent to constructing an error-specific confidence region for the true parameter values and checking if it contains the hypothesized value.

In recent years, there has been increased interest in the field of multiple testing due to new technologies, such as gene expression arrays, that produce data for which (i) the dimension is much larger than the sample size, (ii) the variables (e.g.: genes) are often correlated, and (iii) some proportion of the null hypotheses is expected to be true. Gene expression studies have motivated us to better understand error control in multiple hypothesis testing, though our results apply to multiple testing in general.

Projects:

Optimal Multiple Testing
(with Daniel Rubin and Sandrine Dudoit)

Genotype/Phenotype Associations.
(with Merrill Birkner, Mélanie Courtine, Sandrine Dudoit, Karine Clément , and Jean-Daniel Zucker)

Publications:

M.J. van der Laan, S. Dudoit, K.S. Pollard (2004), Multiple Testing. Part III. Procedures for Control of the Generalized Family-Wise Error Rate and Proportion of False Positives, U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 141 (submitted). (PDF (BEPRESS))

M.J. van der Laan, S. Dudoit, K.S. Pollard (2003), Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate, U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 139 (submitted). (PDF (BEPRESS))

S. Dudoit, M.J. van der Laan, K.S. Pollard (2003), Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates, U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 138 (submitted). (PDF (BEPRESS))

K.S. Pollard, M.J. van der Laan (2003) Resampling-based Multiple Testing: Asymptotic Control of Type I Error and Applications to Gene Expression Data, revised for a special issue of Journal of Statistical Planning and Inference.

K.S. Pollard, M.J. van der Laan (2003) Multiple testing for gene expression data: an investigation of null distributions with consequences for the permutation test, Proceedings of the 2003 International Multi-Conference in Computer Science and Engineering, METMBS’03 Conference, pp.3-9.

Talks given by Katie Pollard:

Resampling-based methods for identification of significant subsets of genes in expression data, a contributed talk at MCP 2002: The 3rd International Conference on Multiple Comparisons (August 5-7, 2002 in Bethesda, MD).

Multiple testing for high dimensional biological data, UC Berkeley seminar (March 20, 2003).

Multiple testing for gene expression data: an investigation of null distributions with consequences for the permutation test, a contributed talk at 2003 International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS ‘03) (June 23 - 26, 2003 in Las Vegas, NV).

Talks given by Mark van der Laan:

Resampling-based multiple testing with asymptotic control of type I error, Stanford University Workshop in Biostatistics (April 24, 2003) and Free University Amsterdam, Bioinformatics Colloqium.