Multi-Player Approaches for Dueling Bandits
- URL: http://arxiv.org/abs/2405.16168v1
- Date: Sat, 25 May 2024 10:25:48 GMT
- Title: Multi-Player Approaches for Dueling Bandits
- Authors: Or Raveh, Junya Honda, Masashi Sugiyama,
- Abstract summary: We show that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting.
We also analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol.
- Score: 58.442742345319225
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Various approaches have emerged for multi-armed bandits in distributed systems. The multiplayer dueling bandit problem, common in scenarios with only preference-based information like human feedback, introduces challenges related to controlling collaborative exploration of non-informative arm pairs, but has received little attention. To fill this gap, we demonstrate that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting when utilizing known dueling bandit algorithms as a foundation. Additionally, we analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol, resulting in expedited exploration in many cases. Our experimental comparisons reveal that our multiplayer algorithms surpass single-player benchmark algorithms, underscoring their efficacy in addressing the nuanced challenges of the multiplayer dueling bandit setting.
Related papers
- QuACK: A Multipurpose Queuing Algorithm for Cooperative $k$-Armed Bandits [5.530212768657544]
We study the cooperative $k$-armed bandit problem, where a network of $m$ agents collaborate to find the optimal action.
We provide a black-box reduction that allows us to extend any single-agent bandit algorithm to the multi-agent setting.
arXiv Detail & Related papers (2024-10-31T12:20:36Z) - Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players.
The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile.
We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z) - Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Identifying Copeland Winners in Dueling Bandits with Indifferences [12.96903983663382]
We consider the task of identifying the Copeland winner(s) in a dueling bandits problem with ternary feedback.
We provide a lower bound on the sample complexity for any learning algorithm finding the Copeland winner(s) with a fixed error probability.
We propose POCOWISTA, an algorithm with a sample complexity that almost matches this lower bound, and which shows excellent empirical performance.
arXiv Detail & Related papers (2023-10-01T17:59:27Z) - An Instance-Dependent Analysis for the Cooperative Multi-Player
Multi-Armed Bandit [93.97385339354318]
We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits.
First, we show that a simple modification to a successive elimination strategy can be used to allow the players to estimate their suboptimality gaps.
Second, we leverage the first result to design a communication protocol that successfully uses the small reward of collisions to coordinate among players.
arXiv Detail & Related papers (2021-11-08T23:38:47Z) - Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased.
We propose a novel approach for dueling bandits based on information-directed sampling (IDS)
Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z) - Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z) - Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous
Feedback [32.62857394584907]
We study the multi-armed bandit (MAB) problem with composite and anonymous feedback.
We propose adaptive algorithms for both the adversarial and non- adversarial cases.
arXiv Detail & Related papers (2020-12-13T12:25:41Z) - The Combinatorial Multi-Bandit Problem and its Application to Energy
Management [2.236663830879273]
We study a Combinatorial Multi-Bandit Problem motivated by applications in energy systems management.
For our energy management application we propose a range of algorithms that combine exploration principles for multi-arm bandits with mathematical programming.
arXiv Detail & Related papers (2020-10-30T13:42:54Z) - Multitask Bandit Learning Through Heterogeneous Feedback Aggregation [35.923544685900055]
We formulate the problem as the $epsilon$-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms.
We develop an upper confidence bound-based algorithm, RobustAgg$(epsilon)$, that adaptively aggregates rewards collected by different players.
arXiv Detail & Related papers (2020-10-29T07:13:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.