Related papers: Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms

Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms

URL: http://arxiv.org/abs/2506.10127v1
Date: Wed, 11 Jun 2025 19:14:32 GMT
Title: Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms
Authors: Xinyi Hu, Aldo Pacchiano,
Abstract summary: We study the decentralized multi-player multi-armed bandits (MMAB) problem under a no-sensing setting.<n>If the number of players pulling an arm exceeds its capacity, all players involved receive zero reward.<n>We propose A-CAPELLA (Algorithm for Capacity-Aware Parallel Elimination for Learning and Allocation), a decentralized algorithm that achieves logarithmic regret.
Score: 24.5966337811692
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the decentralized multi-player multi-armed bandits (MMAB) problem under a no-sensing setting, where each player receives only their own reward and obtains no information about collisions. Each arm has an unknown capacity, and if the number of players pulling an arm exceeds its capacity, all players involved receive zero reward. This setting generalizes the classical unit-capacity model and introduces new challenges in coordination and capacity discovery under severe feedback limitations. We propose A-CAPELLA (Algorithm for Capacity-Aware Parallel Elimination for Learning and Allocation), a decentralized algorithm that achieves logarithmic regret in this generalized regime. Our main contribution is a collaborative hypothesis testing protocol that enables synchronized successive elimination and capacity estimation through carefully structured collision patterns. This represents a provably efficient learning result in decentralized no-sensing MMAB with unknown arm capacities.

Related papers

Reinforcement learning with combinatorial actions for coupled restless bandits [62.89013331120493]
We propose SEQUOIA, an RL algorithm that directly optimize for long-term reward over the feasible action space.<n>We empirically validate SEQUOIA on four novel restless bandit problems with constraints: multiple interventions, path constraints, bipartite matching, and capacity constraints.
arXiv Detail & Related papers (2025-03-01T21:25:21Z)
Continuous K-Max Bandits [54.21533414838677]
We study the $K$-Max multi-armed bandits problem with continuous outcome distributions and weak value-index feedback.<n>This setting captures critical applications in recommendation systems, distributed computing, server scheduling, etc.<n>Our key contribution is the computationally efficient algorithm DCK-UCB, which combines adaptive discretization with bias-corrected confidence bounds.
arXiv Detail & Related papers (2025-02-19T06:37:37Z)
Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players. The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile. We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z)
Multi-Player Approaches for Dueling Bandits [58.442742345319225]
We show that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting.<n>We also analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol.
arXiv Detail & Related papers (2024-05-25T10:25:48Z)
Clustered Linear Contextual Bandits with Knapsacks [9.668078830796999]
We study clustered contextual bandits where rewards and resource consumption are the outcomes of cluster-specific linear models. Pulling an arm in a time period results in a reward and in consumption for each one of multiple resources. We show that it suffices to perform clustering only once to a randomly selected subset of the arms.
arXiv Detail & Related papers (2023-08-21T13:47:13Z)
Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games [70.19141208203227]
We consider the problem of decentralized multi-agent reinforcement learning in Markov games. We show that no algorithm attains no-regret in general-sum games when executed independently by all players. We show that our lower bounds hold even for seemingly easier setting in which all agents are controlled by a centralized algorithm.
arXiv Detail & Related papers (2023-03-22T03:28:12Z)
Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits [6.732901486505047]
Multi-player multi-armed bandit is an increasingly relevant decision-making problem, motivated by applications to cognitive radio systems. This paper proposes a textitmulti-player multi-armed walking bandits model, aiming to address aforementioned modeling issues.
arXiv Detail & Related papers (2022-12-12T23:26:02Z)
Multiple-Play Stochastic Bandits with Shareable Finite-Capacity Arms [32.313813562222066]
We generalize the multiple-play multi-armed bandits (MP-MAB) problem with a shareable arm setting. Each shareable arm has a finite reward capacity and a ''per-load'' reward distribution. We propose an online learning algorithm to address the problem and prove its regret upper bound.
arXiv Detail & Related papers (2022-06-17T13:47:27Z)
Multi-Player Multi-Armed Bandits with Finite Shareable Resources Arms: Learning Algorithms & Applications [32.313813562222066]
We study how decentralized players cooperatively play the same multi-armed bandit so as to maximize their total cumulative rewards. Existing MMAB models mostly assume when more than one player pulls the same arm, they either have a collision and obtain zero rewards, or have no collision and gain independent rewards. We propose an MMAB with shareable resources as an extension to the collision and non-collision settings.
arXiv Detail & Related papers (2022-04-28T13:46:59Z)
An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit [93.97385339354318]
We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits. First, we show that a simple modification to a successive elimination strategy can be used to allow the players to estimate their suboptimality gaps. Second, we leverage the first result to design a communication protocol that successfully uses the small reward of collisions to coordinate among players.
arXiv Detail & Related papers (2021-11-08T23:38:47Z)
Selfish Robustness and Equilibria in Multi-Player Bandits [25.67398941667429]
In a game, several players simultaneously pull arms and encounter a collision - with 0 reward - if some of them pull the same arm at the same time. While the cooperative case where players maximize the collective reward has been mostly considered, to malicious players is a crucial and challenging concern. We shall consider instead the more natural class of selfish players whose incentives are to maximize their individual rewards, potentially at the expense of the social welfare.
arXiv Detail & Related papers (2020-02-04T09:50:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.