Decentralized Multi-Armed Bandit Can Outperform Classic Upper Confidence Bound: A Homogeneous Case over Strongly Connected Graphs
- URL: http://arxiv.org/abs/2111.10933v3
- Date: Sat, 23 Mar 2024 03:41:10 GMT
- Title: Decentralized Multi-Armed Bandit Can Outperform Classic Upper Confidence Bound: A Homogeneous Case over Strongly Connected Graphs
- Authors: Jingxuan Zhu, Ji Liu,
- Abstract summary: This paper studies a homogeneous decentralized multi-armed bandit problem, in which a network of multiple agents faces the same set of arms.
A fully decentralized upper bound confidence (UCB) algorithm is proposed for a multi-agent network whose neighbor relations are described by a directed graph.
- Score: 9.84486119211443
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies a homogeneous decentralized multi-armed bandit problem, in which a network of multiple agents faces the same set of arms, and each agent aims to minimize its own regret. A fully decentralized upper confidence bound (UCB) algorithm is proposed for a multi-agent network whose neighbor relations are described by a directed graph. It is shown that the decentralized algorithm guarantees each agent to achieve a lower logarithmic asymptotic regret compared to the classic UCB algorithm, provided the neighbor graph is strongly connected. The improved asymptotic regret upper bound is reciprocally related to the maximal size of a local neighborhood within the network. The roles of graph connectivity, maximum local degree, and network size are analytically elucidated in the expression of regret.
Related papers
- Scalable spectral representations for network multiagent control [53.631272539560435]
A popular model for multi-agent control, Network Markov Decision Processes (MDPs) pose a significant challenge to efficient learning.
We first derive scalable spectral local representations for network MDPs, which induces a network linear subspace for the local $Q$-function of each agent.
We design a scalable algorithmic framework for continuous state-action network MDPs, and provide end-to-end guarantees for the convergence of our algorithm.
arXiv Detail & Related papers (2024-10-22T17:45:45Z) - TSC: A Simple Two-Sided Constraint against Over-Smoothing [17.274727377858873]
We introduce a simple Two-Sided Constraint (TSC) for Graph Convolutional Neural Network (GCN)
The random masking acts on the representation matrix's columns to regulate the degree of information aggregation from neighbors.
The contrastive constraint, applied to the representation matrix's rows, enhances the discriminability of the nodes.
arXiv Detail & Related papers (2024-08-06T12:52:03Z) - Distributed Optimization via Kernelized Multi-armed Bandits [6.04275169308491]
We model a distributed optimization problem as a multi-agent kernelized multi-armed bandit problem with a heterogeneous reward setting.
We present a fully decentralized algorithm, Multi-agent IGP-UCB (MA-IGP-UCB), which achieves a sub-linear regret bound for popular classes for kernels.
We also propose an extension, Multi-agent Delayed IGP-UCB (MAD-IGP-UCB) algorithm, which reduces the dependence of the regret bound on the number of agents in the network.
arXiv Detail & Related papers (2023-12-07T21:57:48Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Acceleration in Distributed Optimization Under Similarity [72.54787082152278]
We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes.
An $varepsilon$-solution is achieved in $tildemathcalrhoObig(sqrtfracbeta/mu (1-)log1/varepsilonbig)$ number of communications steps.
This rate matches (up to poly-log factors) for the first time lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest.
arXiv Detail & Related papers (2021-10-24T04:03:00Z) - Multi-resource allocation for federated settings: A non-homogeneous
Markov chain model [2.552459629685159]
In a federated setting, agents coordinate with a central agent or a server to solve an optimization problem in which agents do not share their information with each other.
We describe how the basic additive-increase multiplicative-decrease (AIMD) algorithm can be modified in a straightforward manner to solve a class of optimization problems for federated settings for a single shared resource with no inter-agent communication.
We extend the single-resource algorithm to multiple heterogeneous shared resources that emerge in smart cities, sharing economy, and many other applications.
arXiv Detail & Related papers (2021-04-26T19:10:00Z) - Reinforced Neighborhood Selection Guided Multi-Relational Graph Neural
Networks [68.9026534589483]
RioGNN is a novel Reinforced, recursive and flexible neighborhood selection guided multi-relational Graph Neural Network architecture.
RioGNN can learn more discriminative node embedding with enhanced explainability due to the recognition of individual importance of each relation.
arXiv Detail & Related papers (2021-04-16T04:30:06Z) - Bayesian Algorithms for Decentralized Stochastic Bandits [12.350564981588063]
We study a decentralized cooperative multi-agent multi-armed bandit problem with $K$ arms and $N$ agents connected over a network.
In our model, each arm's reward distribution is same for all agents, and rewards are drawn independently across agents and over time steps.
The goal is to minimize cumulative regret averaged over the entire network.
arXiv Detail & Related papers (2020-10-20T19:14:20Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - A Decentralized Policy with Logarithmic Regret for a Class of
Multi-Agent Multi-Armed Bandit Problems with Option Unavailability
Constraints and Stochastic Communication Protocols [0.0]
This paper considers a multi-armed bandit (MAB) problem in which multiple mobile agents receive rewards by sampling from a collection of spatially distributed processes, called bandits.
The goal is to formulate a decentralized policy for each agent, in order to maximize the total cumulative reward over all agents, subject to option availability and inter-agent communication constraints.
To our knowledge, this is the first such decentralized policy for non-fully connected spatial graphs with communication constraints.
arXiv Detail & Related papers (2020-03-29T08:12:49Z) - Quantized Decentralized Stochastic Learning over Directed Graphs [52.94011236627326]
We consider a decentralized learning problem where data points are distributed among computing nodes communicating over a directed graph.
As the model size gets large, decentralized learning faces a major bottleneck that is the communication load due to each node transmitting messages (model updates) to its neighbors.
We propose the quantized decentralized learning algorithm over directed graphs that is based on the push-sum algorithm in decentralized consensus optimization.
arXiv Detail & Related papers (2020-02-23T18:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.