Related papers: Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization

Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization

URL: http://arxiv.org/abs/2112.08507v1
Date: Wed, 15 Dec 2021 22:11:58 GMT
Title: Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization
Authors: Jacob Nogas, Tong Li, Fernando J. Yanez, Arghavan Modiri, Nina Deliu, Ben Prystawski, Sofia S. Villar, Anna Rafferty, Joseph J. Williams
Abstract summary: Multi-armed bandit algorithms like Thompson Sampling can be used to conduct adaptive experiments. We present simulations for 2-arm experiments that explore two algorithms that combine the benefits of uniform randomization for statistical analysis.
Score: 50.725191156128645
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-armed bandit algorithms like Thompson Sampling can be used to conduct adaptive experiments, in which maximizing reward means that data is used to progressively assign more participants to more effective arms. Such assignment strategies increase the risk of statistical hypothesis tests identifying a difference between arms when there is not one, and failing to conclude there is a difference in arms when there truly is one. We present simulations for 2-arm experiments that explore two algorithms that combine the benefits of uniform randomization for statistical analysis, with the benefits of reward maximization achieved by Thompson Sampling (TS). First, Top-Two Thompson Sampling adds a fixed amount of uniform random allocation (UR) spread evenly over time. Second, a novel heuristic algorithm, called TS PostDiff (Posterior Probability of Difference). TS PostDiff takes a Bayesian approach to mixing TS and UR: the probability a participant is assigned using UR allocation is the posterior probability that the difference between two arms is `small' (below a certain threshold), allowing for more UR exploration when there is little or no reward to be gained. We find that TS PostDiff method performs well across multiple effect sizes, and thus does not require tuning based on a guess for the true effect size.

Related papers

Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions. We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z)
Increasing Students' Engagement to Reminder Emails Through Multi-Armed Bandits [60.4933541247257]
This paper shows a real-world adaptive experiment on how students engage with instructors' weekly email reminders to build their time management habits. Using Multi-Armed Bandits (MAB) algorithms in adaptive experiments can increase students' chances of obtaining better outcomes. We highlight problems with these adaptive algorithms - such as possible exploitation of an arm when there is no significant difference.
arXiv Detail & Related papers (2022-08-10T00:30:52Z)
Langevin Monte Carlo for Contextual Bandits [72.00524614312002]
Langevin Monte Carlo Thompson Sampling (LMC-TS) is proposed to directly sample from the posterior distribution in contextual bandits. We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits.
arXiv Detail & Related papers (2022-06-22T17:58:23Z)
Top Two Algorithms Revisited [14.783452541904365]
Top Two algorithms arose as an adaptation of Thompson sampling to best arm identification in multi-armed bandit models. We provide a general analysis of Top Two methods, which identifies desirable properties of the leader, the challenger, and the (possibly non-parametric) distributions of the arms. Our proof method demonstrates in particular that the sampling step used to select the leader inherited from Thompson sampling can be replaced by other choices.
arXiv Detail & Related papers (2022-06-13T09:03:24Z)
Optimal Fixed-Budget Best Arm Identification using the Augmented Inverse Probability Estimator in Two-Armed Gaussian Bandits with Unknown Variances [27.122181278234617]
We consider the fixed-budget best arm identification problem in two-armed Gaussian bandits with unknown variances. We propose a strategy comprising a sampling rule with randomized sampling (RS) following the estimated target allocation probabilities of arm draws. We show that the proposed strategy is agnostically optimal when the sample size becomes infinitely large and the gap between the two arms goes to zero.
arXiv Detail & Related papers (2022-01-12T13:38:33Z)
Challenges in Statistical Analysis of Data Collected by a Bandit Algorithm: An Empirical Exploration in Applications to Adaptively Randomized Experiments [11.464963616709671]
Multi-armed bandit algorithms have been argued for decades as useful for adaptively randomized experiments. We applied the bandit algorithm Thompson Sampling (TS) to run adaptive experiments in three university classes. We show that collecting data with TS can as much as double the False Positive Rate (FPR) and the False Negative Rate (FNR)
arXiv Detail & Related papers (2021-03-22T22:05:18Z)
Doubly-Adaptive Thompson Sampling for Multi-Armed and Contextual Bandits [28.504921333436833]
We propose a variant of a Thompson sampling based algorithm that adaptively re-weighs the terms of a doubly robust estimator on the true mean reward of each arm. The proposed algorithm matches the optimal (minimax) regret rate and its empirical evaluation in a semi-synthetic experiment. We extend this approach to contextual bandits, where there are more sources of bias present apart from the adaptive data collection.
arXiv Detail & Related papers (2021-02-25T22:29:25Z)
Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits [56.31950477139053]
We investigate multi-armed bandit with semi-bandit feedback (CMAB) We analyze variants of the Combinatorial Thompson Sampling policy (CTS) This last result gives us an alternative to the Efficient Sampling for Combinatorial Bandit policy (ESCB)
arXiv Detail & Related papers (2020-06-11T17:12:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.