Rapid and Scalable Bayesian AB Testing
- URL: http://arxiv.org/abs/2307.14628v1
- Date: Thu, 27 Jul 2023 05:08:49 GMT
- Title: Rapid and Scalable Bayesian AB Testing
- Authors: Srivas Chennu, Andrew Maher, Christian Pangerl, Subash Prabanantham,
Jae Hyeon Bae, Jamie Martin and Bud Goswami
- Abstract summary: We propose a solution that applies hierarchical Bayesian estimation to address limitations of current AB testing methodology.
We increase statistical power by exploiting correlations between factors, enabling sequential testing and progressive early stopping.
We also demonstrate how this methodology can be extended to enable the extraction of composite global learnings from past AB tests.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: AB testing aids business operators with their decision making, and is
considered the gold standard method for learning from data to improve digital
user experiences. However, there is usually a gap between the requirements of
practitioners, and the constraints imposed by the statistical hypothesis
testing methodologies commonly used for analysis of AB tests. These include the
lack of statistical power in multivariate designs with many factors,
correlations between these factors, the need of sequential testing for early
stopping, and the inability to pool knowledge from past tests. Here, we propose
a solution that applies hierarchical Bayesian estimation to address the above
limitations. In comparison to current sequential AB testing methodology, we
increase statistical power by exploiting correlations between factors, enabling
sequential testing and progressive early stopping, without incurring excessive
false positive risk. We also demonstrate how this methodology can be extended
to enable the extraction of composite global learnings from past AB tests, to
accelerate future tests. We underpin our work with a solid theoretical
framework that articulates the value of hierarchical estimation. We demonstrate
its utility using both numerical simulations and a large set of real-world AB
tests. Together, these results highlight the practical value of our approach
for statistical inference in the technology industry.
Related papers
- STAR : Bridging Statistical and Agentic Reasoning for Large Model Performance Prediction [78.0692157478247]
We propose STAR, a framework that bridges data-driven STatistical expectations with knowledge-driven Agentic Reasoning.<n>We show that STAR consistently outperforms all baselines on both score-based and rank-based metrics.
arXiv Detail & Related papers (2026-02-12T16:30:07Z) - A Sample Efficient Conditional Independence Test in the Presence of Discretization [54.047334792855345]
Conditional Independence (CI) tests directly to discretized data can lead to incorrect conclusions.<n>Recent advancements have sought to infer the correct CI relationship between the latent variables through binarizing observed data.<n>Motivated by this, this paper introduces a sample-efficient CI test that does not rely on the binarization process.
arXiv Detail & Related papers (2025-06-10T12:41:26Z) - Data Fusion for Partial Identification of Causal Effects [62.56890808004615]
We propose a novel partial identification framework that enables researchers to answer key questions.<n>Is the causal effect positive or negative? and How severe must assumption violations be to overturn this conclusion?<n>We apply our framework to the Project STAR study, which investigates the effect of classroom size on students' third-grade standardized test performance.
arXiv Detail & Related papers (2025-05-30T07:13:01Z) - Testing Conditional Mean Independence Using Generative Neural Networks [8.323172773256449]
We introduce a novel population CMI measure and a bootstrap model-based testing procedure.
Deep generative neural networks are used to estimate the conditional mean functions involved in the population measure.
arXiv Detail & Related papers (2025-01-28T23:35:24Z) - Statistically Valid Information Bottleneck via Multiple Hypothesis Testing [35.59201763567714]
We introduce a statistically valid solution to the information bottleneck (IB) problem via multiple hypothesis testing (IB-MHT)
IB-MHT ensures that the learned features meet the IB constraints with high probability, regardless of the size of the available dataset.
Results validate the effectiveness of IB-MHT in outperforming conventional methods in terms of statistical robustness and reliability.
arXiv Detail & Related papers (2024-09-11T15:04:32Z) - Active Test-Time Adaptation: Theoretical Analyses and An Algorithm [51.84691955495693]
Test-time adaptation (TTA) addresses distribution shifts for streaming test data in unsupervised settings.
We propose the novel problem setting of active test-time adaptation (ATTA) that integrates active learning within the fully TTA setting.
arXiv Detail & Related papers (2024-04-07T22:31:34Z) - Full Bayesian Significance Testing for Neural Networks [26.54203219329441]
We propose to conduct Full Bayesian Significance Testing for neural networks, called textitnFBST.
textitnFBST can test not only global significance but also local and instance-wise significance, which previous testing methods don't focus on.
arXiv Detail & Related papers (2024-01-24T09:59:48Z) - Precise Error Rates for Computationally Efficient Testing [75.63895690909241]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.
An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z) - Deep anytime-valid hypothesis testing [29.273915933729057]
We propose a general framework for constructing powerful, sequential hypothesis tests for nonparametric testing problems.
We develop a principled approach of leveraging the representation capability of machine learning models within the testing-by-betting framework.
Empirical results on synthetic and real-world datasets demonstrate that tests instantiated using our general framework are competitive against specialized baselines.
arXiv Detail & Related papers (2023-10-30T09:46:19Z) - Conditional independence testing under misspecified inductive biases [27.34558936393097]
We study the performance of regression-based CI tests under misspecified inductive biases.
Namely, we propose new approximations or upper bounds for the testing errors of three regression-based tests.
We introduce the Rao-Blackwellized Predictor Test (RBPT), a regression-based CI test robust against misspecified inductive biases.
arXiv Detail & Related papers (2023-07-05T17:53:13Z) - From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing.
This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time.
We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - A Semi-Bayesian Nonparametric Estimator of the Maximum Mean Discrepancy
Measure: Applications in Goodness-of-Fit Testing and Generative Adversarial
Networks [3.623570119514559]
We propose a semi-Bayesian nonparametric (semi-BNP) procedure for the goodness-of-fit (GOF) test.
Our method introduces a novel Bayesian estimator for the maximum mean discrepancy (MMD) measure.
We demonstrate that our proposed test outperforms frequentist MMD-based methods by achieving a lower false rejection and acceptance rate of the null hypothesis.
arXiv Detail & Related papers (2023-03-05T10:36:21Z) - Deep Learning in current Neuroimaging: a multivariate approach with
power and type I error control but arguable generalization ability [0.158310730488265]
A non-parametric framework is proposed that estimates the statistical significance of classifications using deep learning architectures.
A label permutation test is proposed in both studies using cross-validation (CV) and resubstitution with upper bound correction (RUB) as validation methods.
We found in the permutation test that CV and RUB methods offer a false positive rate close to the significance level and an acceptable statistical power.
arXiv Detail & Related papers (2021-03-30T21:15:39Z) - Cross-validation Confidence Intervals for Test Error [83.67415139421448]
This work develops central limit theorems for crossvalidation and consistent estimators of its variance under weak stability conditions on the learning algorithm.
Results are the first of their kind for the popular choice of leave-one-out cross-validation.
arXiv Detail & Related papers (2020-07-24T17:40:06Z) - Marginal likelihood computation for model selection and hypothesis
testing: an extensive review [66.37504201165159]
This article provides a comprehensive study of the state-of-the-art of the topic.
We highlight limitations, benefits, connections and differences among the different techniques.
Problems and possible solutions with the use of improper priors are also described.
arXiv Detail & Related papers (2020-05-17T18:31:58Z) - Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement
Learning Framework [68.96770035057716]
A/B testing is a business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries.
This paper introduces a reinforcement learning framework for carrying A/B testing in online experiments.
arXiv Detail & Related papers (2020-02-05T10:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.