Estimating the number and effect sizes of non-null hypotheses
- URL: http://arxiv.org/abs/2002.07297v2
- Date: Fri, 24 Jul 2020 18:26:17 GMT
- Title: Estimating the number and effect sizes of non-null hypotheses
- Authors: Jennifer Brennan, Ramya Korlakai Vinayak and Kevin Jamieson
- Abstract summary: Knowing the distribution of effect sizes allows us to calculate the power (type II error) of any experimental design.
Our estimator can be used to guarantee the number of discoveries that will be made using a given experimental design in a future experiment.
- Score: 14.34147140416535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of estimating the distribution of effect sizes (the mean
of the test statistic under the alternate hypothesis) in a multiple testing
setting. Knowing this distribution allows us to calculate the power (type II
error) of any experimental design. We show that it is possible to estimate this
distribution using an inexpensive pilot experiment, which takes significantly
fewer samples than would be required by an experiment that identified the
discoveries. Our estimator can be used to guarantee the number of discoveries
that will be made using a given experimental design in a future experiment. We
prove that this simple and computationally efficient estimator enjoys a number
of favorable theoretical properties, and demonstrate its effectiveness on data
from a gene knockout experiment on influenza inhibition in Drosophila.
Related papers
- Variance reduction combining pre-experiment and in-experiment data [0.0]
Online controlled experiments (A/B testing) are essential in data-driven decision-making for many companies.
Existing methods like CUPED and CUPAC use pre-experiment data to reduce variance, but their effectiveness depends on the correlation between the pre-experiment data and the outcome.
We introduce a novel method that combines both pre-experiment and in-experiment data to achieve greater variance reduction than CUPED and CUPAC.
arXiv Detail & Related papers (2024-10-11T17:45:29Z) - A Double Machine Learning Approach to Combining Experimental and Observational Data [59.29868677652324]
We propose a double machine learning approach to combine experimental and observational studies.
Our framework tests for violations of external validity and ignorability under milder assumptions.
arXiv Detail & Related papers (2023-07-04T02:53:11Z) - Detecting Adversarial Data by Probing Multiple Perturbations Using
Expected Perturbation Score [62.54911162109439]
Adversarial detection aims to determine whether a given sample is an adversarial one based on the discrepancy between natural and adversarial distributions.
We propose a new statistic called expected perturbation score (EPS), which is essentially the expected score of a sample after various perturbations.
We develop EPS-based maximum mean discrepancy (MMD) as a metric to measure the discrepancy between the test sample and natural samples.
arXiv Detail & Related papers (2023-05-25T13:14:58Z) - Robust Bayesian Subspace Identification for Small Data Sets [91.3755431537592]
We propose regularized estimators, shrinkage estimators and Bayesian estimation to reduce the effect of variance.
Our experimental results show that our proposed estimators may reduce the estimation risk up to $40%$ of that of traditional subspace methods.
arXiv Detail & Related papers (2022-12-29T00:29:04Z) - Probabilities of Causation: Adequate Size of Experimental and
Observational Samples [17.565045120151865]
Tian and Pearl derived sharp bounds for the probability of necessity and sufficiency (PNS), the probability of sufficiency (PS), and the probability of necessity (PN) using experimental and observational data.
The assumption is that one is in possession of a large enough sample to permit an accurate estimation of the experimental and observational distributions.
We present a method for determining the sample size needed for such estimation, when a given confidence interval (CI) is specified.
arXiv Detail & Related papers (2022-10-10T21:59:49Z) - Classical models may be a better explanation of the Jiuzhang 1.0
Gaussian Boson Sampler than its targeted squeezed light model [0.0]
We propose an alternative classical hypothesis for the validation of the Jiuzhang 1.0 and Jiuzhang 2.0 experiments.
Our results provide a new hypothesis that should be considered in the validation of future GBS experiments.
arXiv Detail & Related papers (2022-07-20T17:39:44Z) - Variance Minimization in the Wasserstein Space for Invariant Causal
Prediction [72.13445677280792]
In this work, we show that the approach taken in ICP may be reformulated as a series of nonparametric tests that scales linearly in the number of predictors.
Each of these tests relies on the minimization of a novel loss function that is derived from tools in optimal transport theory.
We prove under mild assumptions that our method is able to recover the set of identifiable direct causes, and we demonstrate in our experiments that it is competitive with other benchmark causal discovery algorithms.
arXiv Detail & Related papers (2021-10-13T22:30:47Z) - What can the millions of random treatments in nonexperimental data
reveal about causes? [0.0]
The article introduces one such model and a Bayesian approach to combine the $O(n2)$ pairwise observations typically available in nonexperimnetal data.
We demonstrate that the proposed approach recovers causal effects in common NSW samples, as well as in arbitrary subpopulations and an order-of-magnitude larger supersample.
arXiv Detail & Related papers (2021-05-03T20:13:34Z) - With Little Power Comes Great Responsibility [54.96675741328462]
Underpowered experiments make it more difficult to discern the difference between statistical noise and meaningful model improvements.
Small test sets mean that most attempted comparisons to state of the art models will not be adequately powered.
For machine translation, we find that typical test sets of 2000 sentences have approximately 75% power to detect differences of 1 BLEU point.
arXiv Detail & Related papers (2020-10-13T18:00:02Z) - Tracking disease outbreaks from sparse data with Bayesian inference [55.82986443159948]
The COVID-19 pandemic provides new motivation for estimating the empirical rate of transmission during an outbreak.
Standard methods struggle to accommodate the partial observability and sparse data common at finer scales.
We propose a Bayesian framework which accommodates partial observability in a principled manner.
arXiv Detail & Related papers (2020-09-12T20:37:33Z) - Efficient Adaptive Experimental Design for Average Treatment Effect
Estimation [18.027128141189355]
We propose an algorithm for efficient experiments with estimators constructed from dependent samples.
To justify our proposed approach, we provide finite and infinite sample analyses.
arXiv Detail & Related papers (2020-02-13T02:04:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.