Assign Experiment Variants at Scale in Online Controlled Experiments
- URL: http://arxiv.org/abs/2212.08771v1
- Date: Sat, 17 Dec 2022 00:45:12 GMT
- Title: Assign Experiment Variants at Scale in Online Controlled Experiments
- Authors: Qike Li, Samir Jamkhande, Pavel Kochetkov, Pai Liu
- Abstract summary: Online controlled experiments (A/B tests) have become the gold standard for learning the impact of new product features in technology companies.
Technology companies run A/B tests at scale -- hundreds if not thousands of A/B tests concurrently, each with millions of users.
We present a novel assignment algorithm and statistical tests to validate the randomized assignments.
- Score: 1.9205538784019935
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Online controlled experiments (A/B tests) have become the gold standard for
learning the impact of new product features in technology companies.
Randomization enables the inference of causality from an A/B test. The
randomized assignment maps end users to experiment buckets and balances user
characteristics between the groups. Therefore, experiments can attribute any
outcome differences between the experiment groups to the product feature under
experiment. Technology companies run A/B tests at scale -- hundreds if not
thousands of A/B tests concurrently, each with millions of users. The large
scale poses unique challenges to randomization. First, the randomized
assignment must be fast since the experiment service receives hundreds of
thousands of queries per second. Second, the variant assignments must be
independent between experiments. Third, the assignment must be consistent when
users revisit or an experiment enrolls more users. We present a novel
assignment algorithm and statistical tests to validate the randomized
assignments. Our results demonstrate that not only is this algorithm
computationally fast but also satisfies the statistical requirements --
unbiased and independent.
Related papers
- Towards Explainable Test Case Prioritisation with Learning-to-Rank Models [6.289767078502329]
Test case prioritisation ( TCP) is a critical task in regression testing to ensure quality as software evolves.
We present and discuss scenarios that require different explanations and how the particularities of TCP could influence them.
arXiv Detail & Related papers (2024-05-22T16:11:45Z) - Sequential Kernelized Independence Testing [101.22966794822084]
We design sequential kernelized independence tests inspired by kernelized dependence measures.
We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z) - Fair Effect Attribution in Parallel Online Experiments [57.13281584606437]
A/B tests serve the purpose of reliably identifying the effect of changes introduced in online services.
It is common for online platforms to run a large number of simultaneous experiments by splitting incoming user traffic randomly.
Despite a perfect randomization between different groups, simultaneous experiments can interact with each other and create a negative impact on average population outcomes.
arXiv Detail & Related papers (2022-10-15T17:15:51Z) - Model-Free Sequential Testing for Conditional Independence via Testing
by Betting [8.293345261434943]
The proposed test allows researchers to analyze an incoming i.i.d. data stream with any arbitrary dependency structure.
We allow the processing of data points online as soon as they arrive and stop data acquisition once significant results are detected.
arXiv Detail & Related papers (2022-10-01T20:05:33Z) - Using Adaptive Experiments to Rapidly Help Students [5.446351709118483]
We evaluate the effect of homework email reminders in students by conducting an adaptive experiment using the Thompson Sampling algorithm.
We raise a range of open questions about the conditions under which adaptive randomized experiments may be more or less useful.
arXiv Detail & Related papers (2022-08-10T00:43:05Z) - Increasing Students' Engagement to Reminder Emails Through Multi-Armed
Bandits [60.4933541247257]
This paper shows a real-world adaptive experiment on how students engage with instructors' weekly email reminders to build their time management habits.
Using Multi-Armed Bandits (MAB) algorithms in adaptive experiments can increase students' chances of obtaining better outcomes.
We highlight problems with these adaptive algorithms - such as possible exploitation of an arm when there is no significant difference.
arXiv Detail & Related papers (2022-08-10T00:30:52Z) - Towards Continuous Compounding Effects and Agile Practices in
Educational Experimentation [2.7094829962573304]
This paper defines a framework for categorising different experimental processes.
Next generation of education technology successes will be heralded by embracing the full set of processes.
arXiv Detail & Related papers (2021-11-17T13:10:51Z) - Challenges in Statistical Analysis of Data Collected by a Bandit
Algorithm: An Empirical Exploration in Applications to Adaptively Randomized
Experiments [11.464963616709671]
Multi-armed bandit algorithms have been argued for decades as useful for adaptively randomized experiments.
We applied the bandit algorithm Thompson Sampling (TS) to run adaptive experiments in three university classes.
We show that collecting data with TS can as much as double the False Positive Rate (FPR) and the False Negative Rate (FNR)
arXiv Detail & Related papers (2021-03-22T22:05:18Z) - Noisy Adaptive Group Testing using Bayesian Sequential Experimental
Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually.
Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z) - Fine-Tuning Pretrained Language Models: Weight Initializations, Data
Orders, and Early Stopping [62.78338049381917]
Fine-tuning pretrained contextual word embedding models to supervised downstream tasks has become commonplace in natural language processing.
We experiment with four datasets from the GLUE benchmark, fine-tuning BERT hundreds of times on each while varying only the random seeds.
We find substantial performance increases compared to previously reported results, and we quantify how the performance of the best-found model varies as a function of the number of fine-tuning trials.
arXiv Detail & Related papers (2020-02-15T02:40:10Z) - Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement
Learning Framework [68.96770035057716]
A/B testing is a business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries.
This paper introduces a reinforcement learning framework for carrying A/B testing in online experiments.
arXiv Detail & Related papers (2020-02-05T10:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.