Related papers: Assign Experiment Variants at Scale in Online Controlled Experiments

Assign Experiment Variants at Scale in Online Controlled Experiments

URL: http://arxiv.org/abs/2212.08771v1
Date: Sat, 17 Dec 2022 00:45:12 GMT
Title: Assign Experiment Variants at Scale in Online Controlled Experiments
Authors: Qike Li, Samir Jamkhande, Pavel Kochetkov, Pai Liu
Abstract summary: Online controlled experiments (A/B tests) have become the gold standard for learning the impact of new product features in technology companies. Technology companies run A/B tests at scale -- hundreds if not thousands of A/B tests concurrently, each with millions of users. We present a novel assignment algorithm and statistical tests to validate the randomized assignments.
Score: 1.9205538784019935
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Online controlled experiments (A/B tests) have become the gold standard for learning the impact of new product features in technology companies. Randomization enables the inference of causality from an A/B test. The randomized assignment maps end users to experiment buckets and balances user characteristics between the groups. Therefore, experiments can attribute any outcome differences between the experiment groups to the product feature under experiment. Technology companies run A/B tests at scale -- hundreds if not thousands of A/B tests concurrently, each with millions of users. The large scale poses unique challenges to randomization. First, the randomized assignment must be fast since the experiment service receives hundreds of thousands of queries per second. Second, the variant assignments must be independent between experiments. Third, the assignment must be consistent when users revisit or an experiment enrolls more users. We present a novel assignment algorithm and statistical tests to validate the randomized assignments. Our results demonstrate that not only is this algorithm computationally fast but also satisfies the statistical requirements -- unbiased and independent.

Related papers

Towards Reliable Testing for Multiple Information Retrieval System Comparisons [2.9180406633632523]
We use a new approach to assess the reliability of multiple comparison procedures using simulated and real TREC data. Experiments show that Wilcoxon plus the Benjamini-Hochberg correction yields Type I error rates according to the significance level for typical sample sizes.
arXiv Detail & Related papers (2025-01-07T16:48:21Z)
Towards Explainable Test Case Prioritisation with Learning-to-Rank Models [6.289767078502329]
Test case prioritisation ( TCP) is a critical task in regression testing to ensure quality as software evolves. We present and discuss scenarios that require different explanations and how the particularities of TCP could influence them.
arXiv Detail & Related papers (2024-05-22T16:11:45Z)
Sequential Kernelized Independence Testing [101.22966794822084]
We design sequential kernelized independence tests inspired by kernelized dependence measures. We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z)
Fair Effect Attribution in Parallel Online Experiments [57.13281584606437]
A/B tests serve the purpose of reliably identifying the effect of changes introduced in online services. It is common for online platforms to run a large number of simultaneous experiments by splitting incoming user traffic randomly. Despite a perfect randomization between different groups, simultaneous experiments can interact with each other and create a negative impact on average population outcomes.
arXiv Detail & Related papers (2022-10-15T17:15:51Z)
Model-Free Sequential Testing for Conditional Independence via Testing by Betting [8.293345261434943]
The proposed test allows researchers to analyze an incoming i.i.d. data stream with any arbitrary dependency structure. We allow the processing of data points online as soon as they arrive and stop data acquisition once significant results are detected.
arXiv Detail & Related papers (2022-10-01T20:05:33Z)
Using Adaptive Experiments to Rapidly Help Students [5.446351709118483]
We evaluate the effect of homework email reminders in students by conducting an adaptive experiment using the Thompson Sampling algorithm. We raise a range of open questions about the conditions under which adaptive randomized experiments may be more or less useful.
arXiv Detail & Related papers (2022-08-10T00:43:05Z)
Increasing Students' Engagement to Reminder Emails Through Multi-Armed Bandits [60.4933541247257]
This paper shows a real-world adaptive experiment on how students engage with instructors' weekly email reminders to build their time management habits. Using Multi-Armed Bandits (MAB) algorithms in adaptive experiments can increase students' chances of obtaining better outcomes. We highlight problems with these adaptive algorithms - such as possible exploitation of an arm when there is no significant difference.
arXiv Detail & Related papers (2022-08-10T00:30:52Z)
Towards Continuous Compounding Effects and Agile Practices in Educational Experimentation [2.7094829962573304]
This paper defines a framework for categorising different experimental processes. Next generation of education technology successes will be heralded by embracing the full set of processes.
arXiv Detail & Related papers (2021-11-17T13:10:51Z)
Challenges in Statistical Analysis of Data Collected by a Bandit Algorithm: An Empirical Exploration in Applications to Adaptively Randomized Experiments [11.464963616709671]
Multi-armed bandit algorithms have been argued for decades as useful for adaptively randomized experiments. We applied the bandit algorithm Thompson Sampling (TS) to run adaptive experiments in three university classes. We show that collecting data with TS can as much as double the False Positive Rate (FPR) and the False Negative Rate (FNR)
arXiv Detail & Related papers (2021-03-22T22:05:18Z)
Noisy Adaptive Group Testing using Bayesian Sequential Experimental Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually. Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z)
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping [62.78338049381917]
Fine-tuning pretrained contextual word embedding models to supervised downstream tasks has become commonplace in natural language processing. We experiment with four datasets from the GLUE benchmark, fine-tuning BERT hundreds of times on each while varying only the random seeds. We find substantial performance increases compared to previously reported results, and we quantify how the performance of the best-found model varies as a function of the number of fine-tuning trials.
arXiv Detail & Related papers (2020-02-15T02:40:10Z)
Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework [68.96770035057716]
A/B testing is a business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries. This paper introduces a reinforcement learning framework for carrying A/B testing in online experiments.
arXiv Detail & Related papers (2020-02-05T10:25:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.