Algorithms for Adaptive Experiments that Trade-off Statistical Analysis
with Reward: Combining Uniform Random Assignment and Reward Maximization
- URL: http://arxiv.org/abs/2112.08507v1
- Date: Wed, 15 Dec 2021 22:11:58 GMT
- Title: Algorithms for Adaptive Experiments that Trade-off Statistical Analysis
with Reward: Combining Uniform Random Assignment and Reward Maximization
- Authors: Jacob Nogas, Tong Li, Fernando J. Yanez, Arghavan Modiri, Nina Deliu,
Ben Prystawski, Sofia S. Villar, Anna Rafferty, Joseph J. Williams
- Abstract summary: Multi-armed bandit algorithms like Thompson Sampling can be used to conduct adaptive experiments.
We present simulations for 2-arm experiments that explore two algorithms that combine the benefits of uniform randomization for statistical analysis.
- Score: 50.725191156128645
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-armed bandit algorithms like Thompson Sampling can be used to conduct
adaptive experiments, in which maximizing reward means that data is used to
progressively assign more participants to more effective arms. Such assignment
strategies increase the risk of statistical hypothesis tests identifying a
difference between arms when there is not one, and failing to conclude there is
a difference in arms when there truly is one. We present simulations for 2-arm
experiments that explore two algorithms that combine the benefits of uniform
randomization for statistical analysis, with the benefits of reward
maximization achieved by Thompson Sampling (TS). First, Top-Two Thompson
Sampling adds a fixed amount of uniform random allocation (UR) spread evenly
over time. Second, a novel heuristic algorithm, called TS PostDiff (Posterior
Probability of Difference). TS PostDiff takes a Bayesian approach to mixing TS
and UR: the probability a participant is assigned using UR allocation is the
posterior probability that the difference between two arms is `small' (below a
certain threshold), allowing for more UR exploration when there is little or no
reward to be gained. We find that TS PostDiff method performs well across
multiple effect sizes, and thus does not require tuning based on a guess for
the true effect size.
Related papers
- Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Increasing Students' Engagement to Reminder Emails Through Multi-Armed
Bandits [60.4933541247257]
This paper shows a real-world adaptive experiment on how students engage with instructors' weekly email reminders to build their time management habits.
Using Multi-Armed Bandits (MAB) algorithms in adaptive experiments can increase students' chances of obtaining better outcomes.
We highlight problems with these adaptive algorithms - such as possible exploitation of an arm when there is no significant difference.
arXiv Detail & Related papers (2022-08-10T00:30:52Z) - Langevin Monte Carlo for Contextual Bandits [72.00524614312002]
Langevin Monte Carlo Thompson Sampling (LMC-TS) is proposed to directly sample from the posterior distribution in contextual bandits.
We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits.
arXiv Detail & Related papers (2022-06-22T17:58:23Z) - Top Two Algorithms Revisited [14.783452541904365]
Top Two algorithms arose as an adaptation of Thompson sampling to best arm identification in multi-armed bandit models.
We provide a general analysis of Top Two methods, which identifies desirable properties of the leader, the challenger, and the (possibly non-parametric) distributions of the arms.
Our proof method demonstrates in particular that the sampling step used to select the leader inherited from Thompson sampling can be replaced by other choices.
arXiv Detail & Related papers (2022-06-13T09:03:24Z) - Optimal Fixed-Budget Best Arm Identification using the Augmented Inverse
Probability Estimator in Two-Armed Gaussian Bandits with Unknown Variances [27.122181278234617]
We consider the fixed-budget best arm identification problem in two-armed Gaussian bandits with unknown variances.
We propose a strategy comprising a sampling rule with randomized sampling (RS) following the estimated target allocation probabilities of arm draws.
We show that the proposed strategy is agnostically optimal when the sample size becomes infinitely large and the gap between the two arms goes to zero.
arXiv Detail & Related papers (2022-01-12T13:38:33Z) - Challenges in Statistical Analysis of Data Collected by a Bandit
Algorithm: An Empirical Exploration in Applications to Adaptively Randomized
Experiments [11.464963616709671]
Multi-armed bandit algorithms have been argued for decades as useful for adaptively randomized experiments.
We applied the bandit algorithm Thompson Sampling (TS) to run adaptive experiments in three university classes.
We show that collecting data with TS can as much as double the False Positive Rate (FPR) and the False Negative Rate (FNR)
arXiv Detail & Related papers (2021-03-22T22:05:18Z) - Doubly-Adaptive Thompson Sampling for Multi-Armed and Contextual Bandits [28.504921333436833]
We propose a variant of a Thompson sampling based algorithm that adaptively re-weighs the terms of a doubly robust estimator on the true mean reward of each arm.
The proposed algorithm matches the optimal (minimax) regret rate and its empirical evaluation in a semi-synthetic experiment.
We extend this approach to contextual bandits, where there are more sources of bias present apart from the adaptive data collection.
arXiv Detail & Related papers (2021-02-25T22:29:25Z) - Statistical Efficiency of Thompson Sampling for Combinatorial
Semi-Bandits [56.31950477139053]
We investigate multi-armed bandit with semi-bandit feedback (CMAB)
We analyze variants of the Combinatorial Thompson Sampling policy (CTS)
This last result gives us an alternative to the Efficient Sampling for Combinatorial Bandit policy (ESCB)
arXiv Detail & Related papers (2020-06-11T17:12:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.