Statistical Consequences of Dueling Bandits
- URL: http://arxiv.org/abs/2111.00870v1
- Date: Sat, 16 Oct 2021 23:48:43 GMT
- Title: Statistical Consequences of Dueling Bandits
- Authors: Nayan Saxena, Pan Chen, Emmy Liu
- Abstract summary: Multi-Armed-Bandit frameworks have often been used to assess educational interventions.
Recent work has shown that it is more beneficial for a student to provide qualitative feedback through preference elicitation.
We compare traditional uniform sampling to a dueling bandit algorithm and find that dueling bandit algorithms perform well at cumulative regret minimisation, but lead to inflated Type-I error rates and reduced power under certain circumstances.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-Armed-Bandit frameworks have often been used by researchers to assess
educational interventions, however, recent work has shown that it is more
beneficial for a student to provide qualitative feedback through preference
elicitation between different alternatives, making a dueling bandits framework
more appropriate. In this paper, we explore the statistical quality of data
under this framework by comparing traditional uniform sampling to a dueling
bandit algorithm and find that dueling bandit algorithms perform well at
cumulative regret minimisation, but lead to inflated Type-I error rates and
reduced power under certain circumstances. Through these results we provide
insight into the challenges and opportunities in using dueling bandit
algorithms to run adaptive experiments.
Related papers
- Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Multi-Player Approaches for Dueling Bandits [58.442742345319225]
We show that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting.
We also analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol.
arXiv Detail & Related papers (2024-05-25T10:25:48Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Identifying Copeland Winners in Dueling Bandits with Indifferences [12.96903983663382]
We consider the task of identifying the Copeland winner(s) in a dueling bandits problem with ternary feedback.
We provide a lower bound on the sample complexity for any learning algorithm finding the Copeland winner(s) with a fixed error probability.
We propose POCOWISTA, an algorithm with a sample complexity that almost matches this lower bound, and which shows excellent empirical performance.
arXiv Detail & Related papers (2023-10-01T17:59:27Z) - Versatile Dueling Bandits: Best-of-both-World Analyses for Online
Learning from Preferences [28.79598714109439]
We study the problem of $K$-armed dueling bandit for both and adversarial environments.
We first propose a novel reduction from any (general) dueling bandits to multi-armed bandits.
Our algorithm is also the first to achieve an optimal $O(sum_i = 1K fraclog TDelta_i)$ regret bound against the Condorcet-winner benchmark.
arXiv Detail & Related papers (2022-02-14T13:37:23Z) - Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased.
We propose a novel approach for dueling bandits based on information-directed sampling (IDS)
Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z) - An empirical evaluation of active inference in multi-armed bandits [0.0]
The active inference framework is distinguished by its sophisticated strategy for resolving the exploration-exploitation trade-off.
We derive an efficient and approximate scalable active inference algorithm and compare it to two state-of-the-art bandit algorithms.
arXiv Detail & Related papers (2021-01-21T16:20:06Z) - Lifelong Learning in Multi-Armed Bandits [22.301793734117805]
We study the problem in the multi-armed bandit framework with the objective to minimize the total regret incurred over a series of tasks.
While most bandit algorithms are designed to have a low worst-case regret, we examine here the average regret over bandit instances drawn from some prior distribution which may change over time.
arXiv Detail & Related papers (2020-12-28T15:13:31Z) - Reward-Biased Maximum Likelihood Estimation for Linear Stochastic
Bandits [16.042075861624056]
We develop novel index policies that we prove achieve order-optimality, and show that they achieve empirical performance competitive with the state-of-the-art benchmark methods.
The new policies achieve this with low time per pull for linear bandits, and thereby resulting in both favorable regret as well as computational efficiency.
arXiv Detail & Related papers (2020-10-08T16:17:53Z) - Instance-Dependent Complexity of Contextual Bandits and Reinforcement
Learning: A Disagreement-Based Perspective [104.67295710363679]
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm.
We introduce a family of complexity measures that are both sufficient and necessary to obtain instance-dependent regret bounds.
We then introduce new oracle-efficient algorithms which adapt to the gap whenever possible, while also attaining the minimax rate in the worst case.
arXiv Detail & Related papers (2020-10-07T01:33:06Z) - Unifying Clustered and Non-stationary Bandits [50.12992652938055]
Non-stationary bandits and online clustering of bandits lift the restrictive assumptions in contextual bandits.
We propose test of homogeneity, which seamlessly addresses change detection for non-stationary bandits and cluster identification for online clustering of bandit.
Rigorous regret analysis and extensive empirical evaluations demonstrate the value of our proposed solution.
arXiv Detail & Related papers (2020-09-05T04:58:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.