Decentralized Multi-Armed Bandit Can Outperform Classic Upper Confidence Bound: A Homogeneous Case over Strongly Connected Graphs
- URL: http://arxiv.org/abs/2111.10933v3
- Date: Sat, 23 Mar 2024 03:41:10 GMT
- Title: Decentralized Multi-Armed Bandit Can Outperform Classic Upper Confidence Bound: A Homogeneous Case over Strongly Connected Graphs
- Authors: Jingxuan Zhu, Ji Liu,
- Abstract summary: This paper studies a homogeneous decentralized multi-armed bandit problem, in which a network of multiple agents faces the same set of arms.
A fully decentralized upper bound confidence (UCB) algorithm is proposed for a multi-agent network whose neighbor relations are described by a directed graph.
- Score: 9.84486119211443
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies a homogeneous decentralized multi-armed bandit problem, in which a network of multiple agents faces the same set of arms, and each agent aims to minimize its own regret. A fully decentralized upper confidence bound (UCB) algorithm is proposed for a multi-agent network whose neighbor relations are described by a directed graph. It is shown that the decentralized algorithm guarantees each agent to achieve a lower logarithmic asymptotic regret compared to the classic UCB algorithm, provided the neighbor graph is strongly connected. The improved asymptotic regret upper bound is reciprocally related to the maximal size of a local neighborhood within the network. The roles of graph connectivity, maximum local degree, and network size are analytically elucidated in the expression of regret.
Related papers
- A Multiagent Path Search Algorithm for Large-Scale Coalition Structure Generation [61.08720171136229]
Coalition structure generation is a fundamental computational problem in multiagent systems.
We develop SALDAE, a multiagent path finding algorithm for CSG that operates on a graph of coalition structures.
arXiv Detail & Related papers (2025-02-14T15:21:27Z) - Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
We present a $textbfMA-OSMA$ algorithm to transfer the discrete submodular problem into a continuous optimization.
We also introduce a projection-free $textbfMA-OSEA$ algorithm, which effectively utilizes the KL divergence by mixing a uniform distribution.
Our algorithms significantly improve the $(frac11+c)$-approximation provided by the state-of-the-art OSG algorithm.
arXiv Detail & Related papers (2025-02-07T15:57:56Z) - Byzantine-Resilient Decentralized Multi-Armed Bandits [25.499420566469098]
We develop an algorithm that fuses an information mixing step among agents with a truncation of inconsistent and extreme values.
This framework can be used to model attackers in computer networks, instigators of offensive content into recommender systems, or manipulators of financial markets.
arXiv Detail & Related papers (2023-10-11T09:09:50Z) - Distributed Consensus Algorithm for Decision-Making in Multi-agent
Multi-armed Bandit [7.708904950194129]
We study a structured multi-agent multi-armed bandit (MAMAB) problem in a dynamic environment.
A graph reflects the information-sharing structure among agents, and the arms' reward distributions are piecewise-stationary with several unknown change points.
The goal is to develop a decision-making policy for the agents that minimizes the regret, which is the expected total loss of not playing the optimal arm at each time step.
arXiv Detail & Related papers (2023-06-09T16:10:26Z) - On the Complexity of Multi-Agent Decision Making: From Learning in Games
to Partial Monitoring [105.13668993076801]
A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees.
We study this question in a general framework for interactive decision making with multiple agents.
We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making.
arXiv Detail & Related papers (2023-05-01T06:46:22Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Distributed Bandits with Heterogeneous Agents [38.90376765616447]
This paper tackles a multi-agent bandit setting where $M$ agents cooperate together to solve the same instance of a $K$-armed bandit problem.
We propose two learning algorithms, ucbo and AAE.
We prove that both algorithms achieve order-optimal regret, which is $Oleft(sum_i:tildeDelta_i>0 log T/tildeDelta_iright)$, where $tildeDelta_i$ is the minimum suboptimality gap between the reward mean of
arXiv Detail & Related papers (2022-01-23T20:04:15Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Bayesian Algorithms for Decentralized Stochastic Bandits [12.350564981588063]
We study a decentralized cooperative multi-agent multi-armed bandit problem with $K$ arms and $N$ agents connected over a network.
In our model, each arm's reward distribution is same for all agents, and rewards are drawn independently across agents and over time steps.
The goal is to minimize cumulative regret averaged over the entire network.
arXiv Detail & Related papers (2020-10-20T19:14:20Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - Distributed Cooperative Decision Making in Multi-agent Multi-armed
Bandits [6.437761597996503]
We study a distributed decision-making problem in which multiple agents face the same bandit (MAB)
We design a dynamic, consensus-based, distributed estimation algorithm for cooperative estimation of mean rewards at each arm.
We show that both algorithms achieve group performance close to the performance of a central fusion center.
arXiv Detail & Related papers (2020-03-03T03:20:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.