Validation of massively-parallel adaptive testing using dynamic control
matching
- URL: http://arxiv.org/abs/2305.01334v1
- Date: Tue, 2 May 2023 11:28:12 GMT
- Title: Validation of massively-parallel adaptive testing using dynamic control
matching
- Authors: Schaun Wheeler
- Abstract summary: Modern businesses often run many A/B/n tests at the same time and in parallel, and package many content variations into the same messages.
This paper presents a method for disentangling the causal effects of the various tests under conditions of continuous test adaptation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A/B testing is a widely-used paradigm within marketing optimization because
it promises identification of causal effects and because it is implemented out
of the box in most messaging delivery software platforms. Modern businesses,
however, often run many A/B/n tests at the same time and in parallel, and
package many content variations into the same messages, not all of which are
part of an explicit test. Whether as the result of many teams testing at the
same time, or as part of a more sophisticated reinforcement learning (RL)
approach that continuously adapts tests and test condition assignment based on
previous results, dynamic parallel testing cannot be evaluated the same way
traditional A/B tests are evaluated. This paper presents a method for
disentangling the causal effects of the various tests under conditions of
continuous test adaptation, using a matched-synthetic control group that adapts
alongside the tests.
Related papers
- Active Test-Time Adaptation: Theoretical Analyses and An Algorithm [51.84691955495693]
Test-time adaptation (TTA) addresses distribution shifts for streaming test data in unsupervised settings.
We propose the novel problem setting of active test-time adaptation (ATTA) that integrates active learning within the fully TTA setting.
arXiv Detail & Related papers (2024-04-07T22:31:34Z) - Deep anytime-valid hypothesis testing [29.273915933729057]
We propose a general framework for constructing powerful, sequential hypothesis tests for nonparametric testing problems.
We develop a principled approach of leveraging the representation capability of machine learning models within the testing-by-betting framework.
Empirical results on synthetic and real-world datasets demonstrate that tests instantiated using our general framework are competitive against specialized baselines.
arXiv Detail & Related papers (2023-10-30T09:46:19Z) - On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts.
We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z) - DELTA: degradation-free fully test-time adaptation [59.74287982885375]
We find that two unfavorable defects are concealed in the prevalent adaptation methodologies like test-time batch normalization (BN) and self-learning.
First, we reveal that the normalization statistics in test-time BN are completely affected by the currently received test samples, resulting in inaccurate estimates.
Second, we show that during test-time adaptation, the parameter update is biased towards some dominant classes.
arXiv Detail & Related papers (2023-01-30T15:54:00Z) - Robust Continual Test-time Adaptation: Instance-aware BN and
Prediction-balanced Memory [58.72445309519892]
We present a new test-time adaptation scheme that is robust against non-i.i.d. test data streams.
Our novelty is mainly two-fold: (a) Instance-Aware Batch Normalization (IABN) that corrects normalization for out-of-distribution samples, and (b) Prediction-balanced Reservoir Sampling (PBRS) that simulates i.i.d. data stream from non-i.i.d. stream in a class-balanced manner.
arXiv Detail & Related papers (2022-08-10T03:05:46Z) - Hybrid Intelligent Testing in Simulation-Based Verification [0.0]
Several millions of tests may be required to achieve coverage goals.
Coverage-Directed Test Selection learns from coverage feedback to bias testing towards the most effective tests.
Novelty-Driven Verification learns to identify and simulate stimuli that differ from previous stimuli.
arXiv Detail & Related papers (2022-05-19T13:22:08Z) - Comparative Study of Machine Learning Test Case Prioritization for
Continuous Integration Testing [3.8073142980733]
We show that different machine learning models have different performance for different size of test history used for model training and for different time budget available for test case execution.
Our results imply that machine learning approaches for test prioritization in continuous integration testing should be carefully configured to achieve optimal performance.
arXiv Detail & Related papers (2022-04-22T19:20:49Z) - DeepOrder: Deep Learning for Test Case Prioritization in Continuous
Integration Testing [6.767885381740952]
This work introduces DeepOrder, a deep learning-based model that works on the basis of regression machine learning.
DeepOrder ranks test cases based on the historical record of test executions from any number of previous test cycles.
We experimentally show that deep neural networks, as a simple regression model, can be efficiently used for test case prioritization in continuous integration testing.
arXiv Detail & Related papers (2021-10-14T15:10:38Z) - Automated Performance Testing Based on Active Deep Learning [2.179313476241343]
We present an automated test generation method called ACTA for black-box performance testing.
ACTA is based on active learning, which means that it does not require a large set of historical test data to learn about the performance characteristics of the system under test.
We have evaluated ACTA on a benchmark web application, and the experimental results indicate that this method is comparable with random testing.
arXiv Detail & Related papers (2021-04-05T18:19:12Z) - Noisy Adaptive Group Testing using Bayesian Sequential Experimental
Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually.
Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z) - Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement
Learning Framework [68.96770035057716]
A/B testing is a business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries.
This paper introduces a reinforcement learning framework for carrying A/B testing in online experiments.
arXiv Detail & Related papers (2020-02-05T10:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.