Related papers: Multitask Bandit Learning Through Heterogeneous Feedback Aggregation

Multitask Bandit Learning Through Heterogeneous Feedback Aggregation

URL: http://arxiv.org/abs/2010.15390v2
Date: Tue, 20 Jul 2021 00:27:13 GMT
Title: Multitask Bandit Learning Through Heterogeneous Feedback Aggregation
Authors: Zhi Wang, Chicheng Zhang, Manish Kumar Singh, Laurel D. Riek, Kamalika Chaudhuri
Abstract summary: We formulate the problem as the $epsilon$-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms. We develop an upper confidence bound-based algorithm, RobustAgg$(epsilon)$, that adaptively aggregates rewards collected by different players.
Score: 35.923544685900055
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In many real-world applications, multiple agents seek to learn how to perform highly related yet slightly different tasks in an online bandit learning protocol. We formulate this problem as the $\epsilon$-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms, and for each arm, the reward distributions for all players are similar but not necessarily identical. We develop an upper confidence bound-based algorithm, RobustAgg$(\epsilon)$, that adaptively aggregates rewards collected by different players. In the setting where an upper bound on the pairwise similarities of reward distributions between players is known, we achieve instance-dependent regret guarantees that depend on the amenability of information sharing across players. We complement these upper bounds with nearly matching lower bounds. In the setting where pairwise similarities are unknown, we provide a lower bound, as well as an algorithm that trades off minimax regret guarantees for adaptivity to unknown similarity structure.

Related papers

QuACK: A Multipurpose Queuing Algorithm for Cooperative $k$-Armed Bandits [5.530212768657544]
We study the cooperative $k$-armed bandit problem, where a network of $m$ agents collaborate to find the optimal action. We provide a black-box reduction that allows us to extend any single-agent bandit algorithm to the multi-agent setting.
arXiv Detail & Related papers (2024-10-31T12:20:36Z)
Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players. The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile. We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z)
Honor Among Bandits: No-Regret Learning for Online Fair Division [20.38824614301761]
We consider the problem of online fair division of indivisible goods to players when there are a finite number of types of goods and player values are drawn from distributions with unknown means.<n>Our main result is the design of an explore-then-commit algorithm that achieves $tildeO(T2/3)$ regret while maintaining either fairness constraint.
arXiv Detail & Related papers (2024-07-01T20:44:52Z)
Multi-Player Approaches for Dueling Bandits [58.442742345319225]
We show that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting. We also analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol.
arXiv Detail & Related papers (2024-05-25T10:25:48Z)
The Pareto Frontier of Instance-Dependent Guarantees in Multi-Player Multi-Armed Bandits with no Communication [10.446001329147112]
We study the multi-player multi-armed bandit problem. In this problem, $m$ players cooperate to maximize their total reward from $K > m$ arms. We ask whether it is possible to obtain optimal instance-dependent regret $tildeO (1/Delta)$ where $Delta$ is the gap between the $m$-th and $m+1$-st best arms.
arXiv Detail & Related papers (2022-02-19T18:19:36Z)
An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit [93.97385339354318]
We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits. First, we show that a simple modification to a successive elimination strategy can be used to allow the players to estimate their suboptimality gaps. Second, we leverage the first result to design a communication protocol that successfully uses the small reward of collisions to coordinate among players.
arXiv Detail & Related papers (2021-11-08T23:38:47Z)
Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z)
Cooperative and Stochastic Multi-Player Multi-Armed Bandit: Optimal Regret With Neither Communication Nor Collisions [4.974932889340056]
We consider the cooperative multi-player version of the multi-armed bandit problem. We show that these properties are achievable for any number of players and arms.
arXiv Detail & Related papers (2020-11-08T03:14:19Z)
Contextual Blocking Bandits [35.235375147227124]
We study a novel variant of the multi-armed bandit problem, where at each time step, the player observes an independently sampled context that determines the arms' mean rewards. Playing an arm blocks it (across all contexts) for a fixed and known number of future time steps. We propose a UCB-based variant of the full-information algorithm that guarantees a $mathcalO(log T)$-regret w.r.t. an $alpha$regret strategy in $T time steps, matching the $Omega(log(T)$ lower bound
arXiv Detail & Related papers (2020-03-06T20:34:42Z)
Tight Lower Bounds for Combinatorial Multi-Armed Bandits [72.56064196252498]
The Combinatorial Multi-Armed Bandit problem is a sequential decision-making problem in which an agent selects a set of arms on each round. We show that the recently proposed Gini-weighted smoothness parameter determines the lower bounds for monotone reward functions.
arXiv Detail & Related papers (2020-02-13T08:53:43Z)
Selfish Robustness and Equilibria in Multi-Player Bandits [25.67398941667429]
In a game, several players simultaneously pull arms and encounter a collision - with 0 reward - if some of them pull the same arm at the same time. While the cooperative case where players maximize the collective reward has been mostly considered, to malicious players is a crucial and challenging concern. We shall consider instead the more natural class of selfish players whose incentives are to maximize their individual rewards, potentially at the expense of the social welfare.
arXiv Detail & Related papers (2020-02-04T09:50:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.