A/B Testing: A Systematic Literature Review
- URL: http://arxiv.org/abs/2308.04929v1
- Date: Wed, 9 Aug 2023 12:55:51 GMT
- Title: A/B Testing: A Systematic Literature Review
- Authors: Federico Quin and Danny Weyns and Matthias Galster and Camila Costa
Silva
- Abstract summary: Single classic A/B tests are the dominating type of tests.
The dominating use of the test results are feature selection, feature rollout, and continued feature development.
The main reported open problems are enhancement of proposed approaches and their usability.
- Score: 10.222047656342493
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In A/B testing two variants of a piece of software are compared in the field
from an end user's point of view, enabling data-driven decision making. While
widely used in practice, no comprehensive study has been conducted on the
state-of-the-art in A/B testing. This paper reports the results of a systematic
literature review that analyzed 141 primary studies. The results shows that the
main targets of A/B testing are algorithms and visual elements. Single classic
A/B tests are the dominating type of tests. Stakeholders have three main roles
in the design of A/B tests: concept designer, experiment architect, and setup
technician. The primary types of data collected during the execution of A/B
tests are product/system data and user-centric data. The dominating use of the
test results are feature selection, feature rollout, and continued feature
development. Stakeholders have two main roles during A/B test execution:
experiment coordinator and experiment assessor. The main reported open problems
are enhancement of proposed approaches and their usability. Interesting lines
for future research include: strengthen the adoption of statistical methods in
A/B testing, improving the process of A/B testing, and enhancing the automation
of A/B testing.
Related papers
- Harnessing the Power of Interleaving and Counterfactual Evaluation for Airbnb Search Ranking [14.97060265751423]
Evaluation plays a crucial role in the development of ranking algorithms on search and recommender systems.<n>Online environment is conducive to applying causal inference techniques.<n>Business face unique challenges when it comes to effective A/B test.
arXiv Detail & Related papers (2025-08-01T16:28:18Z) - TestAgent: An Adaptive and Intelligent Expert for Human Assessment [62.060118490577366]
We propose TestAgent, a large language model (LLM)-powered agent designed to enhance adaptive testing through interactive engagement.<n>TestAgent supports personalized question selection, captures test-takers' responses and anomalies, and provides precise outcomes through dynamic, conversational interactions.
arXiv Detail & Related papers (2025-06-03T16:07:54Z) - Towards Reliable AI: Adequacy Metrics for Ensuring the Quality of
System-level Testing of Autonomous Vehicles [5.634825161148484]
We introduce a set of black-box test adequacy metrics called "Test suite Instance Space Adequacy" (TISA) metrics.
The TISA metrics offer a way to assess both the diversity and coverage of the test suite and the range of bugs detected during testing.
We evaluate the efficacy of the TISA metrics by examining their correlation with the number of bugs detected in system-level simulation testing of AVs.
arXiv Detail & Related papers (2023-11-14T10:16:05Z) - Using Auxiliary Data to Boost Precision in the Analysis of A/B Tests on
an Online Educational Platform: New Data and New Results [1.5293427903448025]
A/B tests allow causal effect estimation without confounding bias and exact statistical inference even in small samples.
Recent methodological advances have shown that power and statistical precision can be substantially boosted by coupling design-based causal estimation to machine-learning models of rich log data from historical users who were not in the experiment.
We show that the gains can be even larger for estimating subgroup effects, hold even when the remnant is unrepresentative of the A/B test sample, and extend to post-stratification population effects estimators.
arXiv Detail & Related papers (2023-06-09T21:54:36Z) - Automating Pipelines of A/B Tests with Population Split Using
Self-Adaptation and Machine Learning [10.635137352476246]
A/B testing is a common approach used in industry to facilitate innovation through the introduction of new features or the modification of existing software.
Traditionally, A/B tests are conducted sequentially, with each experiment targeting the entire population of the corresponding application.
To tackle these problems, we introduce a new self-adaptive approach called AutoPABS, that automates the execution of pipelines of A/B tests.
arXiv Detail & Related papers (2023-06-02T09:54:59Z) - Pre-trained Embeddings for Entity Resolution: An Experimental Analysis
[Experiment, Analysis & Benchmark] [65.11858854040544]
We perform a thorough experimental analysis of 12 popular language models over 17 established benchmark datasets.
First, we assess their vectorization overhead for converting all input entities into dense embeddings vectors.
Second, we investigate their blocking performance, performing a detailed scalability analysis, and comparing them with the state-of-the-art deep learning-based blocking method.
Third, we conclude with their relative performance for both supervised and unsupervised matching.
arXiv Detail & Related papers (2023-04-24T08:53:54Z) - A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts [143.14128737978342]
Test-time adaptation, an emerging paradigm, has the potential to adapt a pre-trained model to unlabeled data during testing, before making predictions.
Recent progress in this paradigm highlights the significant benefits of utilizing unlabeled data for training self-adapted models prior to inference.
arXiv Detail & Related papers (2023-03-27T16:32:21Z) - Towards Robust Visual Question Answering: Making the Most of Biased
Samples via Contrastive Learning [54.61762276179205]
We propose a novel contrastive learning approach, MMBS, for building robust VQA models by Making the Most of Biased Samples.
Specifically, we construct positive samples for contrastive learning by eliminating the information related to spurious correlation from the original training samples.
We validate our contributions by achieving competitive performance on the OOD dataset VQA-CP v2 while preserving robust performance on the ID dataset VQA v2.
arXiv Detail & Related papers (2022-10-10T11:05:21Z) - Introspective Distillation for Robust Question Answering [70.18644911309468]
Question answering (QA) models are well-known to exploit data bias, e.g., the language prior in visual QA and the position bias in reading comprehension.
Recent debiasing methods achieve good out-of-distribution (OOD) generalizability with a considerable sacrifice of the in-distribution (ID) performance.
We present a novel debiasing method called Introspective Distillation (IntroD) to make the best of both worlds for QA.
arXiv Detail & Related papers (2021-11-01T15:30:15Z) - TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning
Tasks [14.547623982073475]
Deep learning systems are notoriously difficult to test and debug.
It is essential to conduct test selection and label only those selected "high quality" bug-revealing test inputs for test cost reduction.
We propose a novel test prioritization technique that brings order into the unlabeled test instances according to their bug-revealing capabilities, namely TestRank.
arXiv Detail & Related papers (2021-05-21T03:41:10Z) - Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring
Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics.
We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z) - Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement
Learning Framework [68.96770035057716]
A/B testing is a business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries.
This paper introduces a reinforcement learning framework for carrying A/B testing in online experiments.
arXiv Detail & Related papers (2020-02-05T10:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.