On-Demand Communication for Asynchronous Multi-Agent Bandits
- URL: http://arxiv.org/abs/2302.07446v2
- Date: Thu, 31 Aug 2023 02:28:41 GMT
- Title: On-Demand Communication for Asynchronous Multi-Agent Bandits
- Authors: Yu-Zhen Janice Chen, Lin Yang, Xuchuang Wang, Xutong Liu, Mohammad
Hajiesmaili, John C.S. Lui, Don Towsley
- Abstract summary: ODC is an on-demand communication protocol that tailors the communication of each pair of agents based on their empirical pull times.
ODC is efficient when the pull times of agents are highly heterogeneous, and its communication complexity depends on the empirical pull times of agents.
- Score: 43.3383282058292
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies a cooperative multi-agent multi-armed stochastic bandit
problem where agents operate asynchronously -- agent pull times and rates are
unknown, irregular, and heterogeneous -- and face the same instance of a
K-armed bandit problem. Agents can share reward information to speed up the
learning process at additional communication costs. We propose ODC, an
on-demand communication protocol that tailors the communication of each pair of
agents based on their empirical pull times. ODC is efficient when the pull
times of agents are highly heterogeneous, and its communication complexity
depends on the empirical pull times of agents. ODC is a generic protocol that
can be integrated into most cooperative bandit algorithms without degrading
their performance. We then incorporate ODC into the natural extensions of UCB
and AAE algorithms and propose two communication-efficient cooperative
algorithms. Our analysis shows that both algorithms are near-optimal in regret.
Related papers
- Federated Contextual Cascading Bandits with Asynchronous Communication
and Heterogeneous Users [95.77678166036561]
We propose a UCB-type algorithm with delicate communication protocols.
We give sub-linear regret bounds on par with those achieved in the synchronous framework.
Empirical evaluation on synthetic and real-world datasets validates our algorithm's superior performance in terms of regrets and communication costs.
arXiv Detail & Related papers (2024-02-26T05:31:14Z) - Pure Exploration in Asynchronous Federated Bandits [57.02106627533004]
We study the federated pure exploration problem of multi-armed bandits and linear bandits, where $M$ agents cooperatively identify the best arm via communicating with the central server.
We propose the first asynchronous multi-armed bandit and linear bandit algorithms for pure exploration with fixed confidence.
arXiv Detail & Related papers (2023-10-17T06:04:00Z) - ProAgent: Building Proactive Cooperative Agents with Large Language
Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents.
ProAgent can analyze the present state, and infer the intentions of teammates from observations.
ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z) - On Regret-optimal Cooperative Nonstochastic Multi-armed Bandits [7.23389716633927]
We show that a collaborative multi-agent emphfollow-the-regularized-leader (FTRL) algorithm has an individual regret upper bound that matches the lower bound up to a constant factor.
We also show that an FTRL algorithm with a suitable regularizer is regret optimal with respect to the scaling with the edge-delay parameter.
arXiv Detail & Related papers (2022-11-30T16:46:41Z) - Private and Byzantine-Proof Cooperative Decision-Making [15.609414012418043]
The cooperative bandit problem is a multi-agent decision problem involving a group of agents that interact simultaneously with a multi-armed bandit.
In this paper, we investigate the bandit problem under two settings - (a) when the agents wish to make their communication private with respect to the action sequence, and (b) when the agents can be byzantine.
We provide upper-confidence bound algorithms that obtain optimal regret while being (a) differentially-private and (b) private.
Our decentralized algorithms require no information about the network of connectivity between agents, making them scalable to large dynamic systems.
arXiv Detail & Related papers (2022-05-27T18:03:54Z) - Depthwise Convolution for Multi-Agent Communication with Enhanced
Mean-Field Approximation [9.854975702211165]
We propose a new method based on local communication learning to tackle the multi-agent RL (MARL) challenge.
First, we design a new communication protocol that exploits the ability of depthwise convolution to efficiently extract local relations.
Second, we introduce the mean-field approximation into our method to reduce the scale of agent interactions.
arXiv Detail & Related papers (2022-03-06T07:42:43Z) - Distributed Adaptive Learning Under Communication Constraints [54.22472738551687]
This work examines adaptive distributed learning strategies designed to operate under communication constraints.
We consider a network of agents that must solve an online optimization problem from continual observation of streaming data.
arXiv Detail & Related papers (2021-12-03T19:23:48Z) - Multi-agent Policy Optimization with Approximatively Synchronous
Advantage Estimation [55.96893934962757]
In multi-agent system, polices of different agents need to be evaluated jointly.
In current methods, value functions or advantage functions use counter-factual joint actions which are evaluated asynchronously.
In this work, we propose the approximatively synchronous advantage estimation.
arXiv Detail & Related papers (2020-12-07T07:29:19Z) - Kernel Methods for Cooperative Multi-Agent Contextual Bandits [15.609414012418043]
Cooperative multi-agent decision making involves a group of agents cooperatively solving learning problems while communicating over a network with delays.
We consider the kernelised contextual bandit problem, where the reward obtained by an agent is an arbitrary linear function of the contexts' images in the related kernel reproducing Hilbert space (RKHS)
We propose textscCoop- KernelUCB, an algorithm that provides near-optimal bounds on the per-agent regret.
arXiv Detail & Related papers (2020-08-14T07:37:44Z) - R-MADDPG for Partially Observable Environments and Limited Communication [42.771013165298186]
This paper introduces a deep recurrent multiagent actor-critic framework (R-MADDPG) for handling multiagent coordination under partial observable set-tings and limited communication.
We demonstrate that the resulting framework learns time dependencies for sharing missing observations, handling resource limitations, and developing different communication patterns among agents.
arXiv Detail & Related papers (2020-02-16T21:25:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.