Distributed Cooperative Multi-Agent Reinforcement Learning with Directed
Coordination Graph
- URL: http://arxiv.org/abs/2201.04962v1
- Date: Mon, 10 Jan 2022 04:14:46 GMT
- Title: Distributed Cooperative Multi-Agent Reinforcement Learning with Directed
Coordination Graph
- Authors: Gangshan Jing, He Bai, Jemin George, Aranya Chakrabortty and Piyush.
K. Sharma
- Abstract summary: Existing distributed cooperative multi-agent reinforcement learning (MARL) frameworks assume undirected coordination graphs and communication graphs.
We propose a distributed RL algorithm where the local policy evaluations are based on local value functions.
- Score: 18.04270684579841
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing distributed cooperative multi-agent reinforcement learning (MARL)
frameworks usually assume undirected coordination graphs and communication
graphs while estimating a global reward via consensus algorithms for policy
evaluation. Such a framework may induce expensive communication costs and
exhibit poor scalability due to requirement of global consensus. In this work,
we study MARLs with directed coordination graphs, and propose a distributed RL
algorithm where the local policy evaluations are based on local value
functions. The local value function of each agent is obtained by local
communication with its neighbors through a directed learning-induced
communication graph, without using any consensus algorithm. A zeroth-order
optimization (ZOO) approach based on parameter perturbation is employed to
achieve gradient estimation. By comparing with existing ZOO-based RL
algorithms, we show that our proposed distributed RL algorithm guarantees high
scalability. A distributed resource allocation example is shown to illustrate
the effectiveness of our algorithm.
Related papers
- Decentralized Federated Learning with Gradient Tracking over Time-Varying Directed Networks [42.92231921732718]
We propose a consensus-based algorithm called DSGTm-TV.
It incorporates gradient tracking and heavy-ball momentum to optimize a global objective function.
Under DSGTm-TV, agents will update local model parameters and gradient estimates using information exchange with neighboring agents.
arXiv Detail & Related papers (2024-09-25T06:23:16Z) - Intelligent Hybrid Resource Allocation in MEC-assisted RAN Slicing Network [72.2456220035229]
We aim to maximize the SSR for heterogeneous service demands in the cooperative MEC-assisted RAN slicing system.
We propose a recurrent graph reinforcement learning (RGRL) algorithm to intelligently learn the optimal hybrid RA policy.
arXiv Detail & Related papers (2024-05-02T01:36:13Z) - Fundamental Limits of Communication Efficiency for Model Aggregation in
Distributed Learning: A Rate-Distortion Approach [54.311495894129585]
We study the limit of communication cost of model aggregation in distributed learning from a rate-distortion perspective.
It is found that the communication gain by exploiting the correlation between worker nodes is significant for SignSGD.
arXiv Detail & Related papers (2022-06-28T13:10:40Z) - Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD
for Communication Efficient Nonconvex Distributed Learning [58.79085525115987]
Local methods are one of the promising approaches to reduce communication time.
We show that the communication complexity is better than non-local methods when the local datasets is smaller than the smoothness local loss.
arXiv Detail & Related papers (2022-02-12T15:12:17Z) - Dimension-Free Rates for Natural Policy Gradient in Multi-Agent
Reinforcement Learning [22.310861786709538]
We propose a scalable algorithm for cooperative multi-agent reinforcement learning.
We show that our algorithm converges to the globally optimal policy with a dimension-free statistical and computational complexity.
arXiv Detail & Related papers (2021-09-23T23:38:15Z) - Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent [7.6860514640178]
We propose a novel zeroth-order optimization algorithm for distributed reinforcement learning.
It allows each agent to estimate its local gradient by cost evaluation independently, without use of any consensus protocol.
arXiv Detail & Related papers (2021-07-26T18:11:07Z) - Distributed Q-Learning with State Tracking for Multi-agent Networked
Control [61.63442612938345]
This paper studies distributed Q-learning for Linear Quadratic Regulator (LQR) in a multi-agent network.
We devise a state tracking (ST) based Q-learning algorithm to design optimal controllers for agents.
arXiv Detail & Related papers (2020-12-22T22:03:49Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z) - Cooperative Multi-Agent Reinforcement Learning with Partial Observations [16.895704973433382]
We propose a distributed zeroth-order policy optimization method for Multi-Agent Reinforcement Learning (MARL)
It allows the agents to compute the local policy gradients needed to update their local policy functions using local estimates of the global accumulated rewards.
We show that the proposed distributed zeroth-order policy optimization method with constant stepsize converges to the neighborhood of a policy that is a stationary point of the global objective function.
arXiv Detail & Related papers (2020-06-18T19:36:22Z) - Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search.
We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z) - COKE: Communication-Censored Decentralized Kernel Learning [30.795725108364724]
Multiple interconnected agents aim to learn an optimal decision function defined over a reproducing kernel Hilbert space by jointly minimizing a global objective function.
As a non-parametric approach, kernel iteration learning faces a major challenge in distributed implementation.
We develop a communication-censored kernel learning (COKE) algorithm that reduces the communication load of DKLA by preventing an agent from transmitting at every generalization unless its local updates are deemed informative.
arXiv Detail & Related papers (2020-01-28T01:05:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.