Bayesian Algorithms for Decentralized Stochastic Bandits
- URL: http://arxiv.org/abs/2010.10569v2
- Date: Wed, 28 Oct 2020 00:06:05 GMT
- Title: Bayesian Algorithms for Decentralized Stochastic Bandits
- Authors: Anusha Lalitha and Andrea Goldsmith
- Abstract summary: We study a decentralized cooperative multi-agent multi-armed bandit problem with $K$ arms and $N$ agents connected over a network.
In our model, each arm's reward distribution is same for all agents, and rewards are drawn independently across agents and over time steps.
The goal is to minimize cumulative regret averaged over the entire network.
- Score: 12.350564981588063
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a decentralized cooperative multi-agent multi-armed bandit problem
with $K$ arms and $N$ agents connected over a network. In our model, each arm's
reward distribution is same for all agents, and rewards are drawn independently
across agents and over time steps. In each round, agents choose an arm to play
and subsequently send a message to their neighbors. The goal is to minimize
cumulative regret averaged over the entire network. We propose a decentralized
Bayesian multi-armed bandit framework that extends single-agent Bayesian bandit
algorithms to the decentralized setting. Specifically, we study an information
assimilation algorithm that can be combined with existing Bayesian algorithms,
and using this, we propose a decentralized Thompson Sampling algorithm and
decentralized Bayes-UCB algorithm. We analyze the decentralized Thompson
Sampling algorithm under Bernoulli rewards and establish a problem-dependent
upper bound on the cumulative regret. We show that regret incurred scales
logarithmically over the time horizon with constants that match those of an
optimal centralized agent with access to all observations across the network.
Our analysis also characterizes the cumulative regret in terms of the network
structure. Through extensive numerical studies, we show that our extensions of
Thompson Sampling and Bayes-UCB incur lesser cumulative regret than the
state-of-art algorithms inspired by the Upper Confidence Bound algorithm. We
implement our proposed decentralized Thompson Sampling under gossip protocol,
and over time-varying networks, where each communication link has a fixed
probability of failure.
Related papers
- Continuous K-Max Bandits [54.21533414838677]
We study the $K$-Max multi-armed bandits problem with continuous outcome distributions and weak value-index feedback.
This setting captures critical applications in recommendation systems, distributed computing, server scheduling, etc.
Our key contribution is the computationally efficient algorithm DCK-UCB, which combines adaptive discretization with bias-corrected confidence bounds.
arXiv Detail & Related papers (2025-02-19T06:37:37Z) - A Multiagent Path Search Algorithm for Large-Scale Coalition Structure Generation [61.08720171136229]
Coalition structure generation is a fundamental computational problem in multiagent systems.
We develop SALDAE, a multiagent path finding algorithm for CSG that operates on a graph of coalition structures.
arXiv Detail & Related papers (2025-02-14T15:21:27Z) - Multi-Agent Best Arm Identification in Stochastic Linear Bandits [0.7673339435080443]
We study the problem of collaborative best-arm identification in linear bandits under a fixed-budget scenario.
In our learning model, we consider multiple agents connected through a star network or a generic network, interacting with a linear bandit instance in parallel.
We devise the algorithms MaLinBAI-Star and MaLinBAI-Gen for star networks and generic networks respectively.
arXiv Detail & Related papers (2024-11-20T20:09:44Z) - Distributed Consensus Algorithm for Decision-Making in Multi-agent
Multi-armed Bandit [7.708904950194129]
We study a structured multi-agent multi-armed bandit (MAMAB) problem in a dynamic environment.
A graph reflects the information-sharing structure among agents, and the arms' reward distributions are piecewise-stationary with several unknown change points.
The goal is to develop a decision-making policy for the agents that minimizes the regret, which is the expected total loss of not playing the optimal arm at each time step.
arXiv Detail & Related papers (2023-06-09T16:10:26Z) - Byzantine-Robust Online and Offline Distributed Reinforcement Learning [60.970950468309056]
We consider a distributed reinforcement learning setting where multiple agents explore the environment and communicate their experiences through a central server.
$alpha$-fraction of agents are adversarial and can report arbitrary fake information.
We seek to identify a near-optimal policy for the underlying Markov decision process in the presence of these adversarial agents.
arXiv Detail & Related papers (2022-06-01T00:44:53Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Decentralized Upper Confidence Bound Algorithms for Homogeneous Multi-Agent Multi-Armed Bandits [16.038995442397972]
The problem is simultaneously solved by $N$ agents assuming they face a common set of $M$ arms and share the same arms' reward distributions.
Two fully decentralized upper confidence bound (UCB) algorithms are proposed for undirected graphs.
The proposed UCB1 and KL-UCB algorithms permit each agent in the network to achieve a better logarithmic regret than their single-agent counterparts.
arXiv Detail & Related papers (2021-11-22T01:05:52Z) - Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms.
Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z) - Decentralized Multi-Agent Linear Bandits with Safety Constraints [31.67685495996986]
We study decentralized linear bandits, where a network of $N$ agents acts cooperatively to solve a linear bandit-optimization problem.
We propose DLUCB: a fully decentralized algorithm that minimizes the cumulative regret over the entire network.
We show that our ideas extend naturally to the emerging, albeit more challenging, setting of safe bandits.
arXiv Detail & Related papers (2020-12-01T07:33:00Z) - Distributed Bandits: Probabilistic Communication on $d$-regular Graphs [5.33024001730262]
We study the decentralized multi-agent multi-armed bandit problem for agents that communicate with probability over a network defined by a $d$-regular graph.
We propose a new Upper Confidence Bound (UCB) based algorithm and analyze how agent-based strategies contribute to minimizing group regret.
arXiv Detail & Related papers (2020-11-16T04:53:54Z) - Neural Thompson Sampling [94.82847209157494]
We propose a new algorithm, called Neural Thompson Sampling, which adapts deep neural networks for both exploration and exploitation.
At the core of our algorithm is a novel posterior distribution of the reward, where its mean is the neural network approximator, and its variance is built upon the neural tangent features of the corresponding neural network.
arXiv Detail & Related papers (2020-10-02T07:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.