F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning
- URL: http://arxiv.org/abs/2004.11145v2
- Date: Fri, 7 Jul 2023 05:16:29 GMT
- Title: F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning
- Authors: Wenhao Li and Bo Jin and Xiangfeng Wang and Junchi Yan and Hongyuan
Zha
- Abstract summary: Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
- Score: 110.35516334788687
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional centralized multi-agent reinforcement learning (MARL) algorithms
are sometimes unpractical in complicated applications, due to non-interactivity
between agents, curse of dimensionality and computation complexity. Hence,
several decentralized MARL algorithms are motivated. However, existing
decentralized methods only handle the fully cooperative setting where massive
information needs to be transmitted in training. The block coordinate gradient
descent scheme they used for successive independent actor and critic steps can
simplify the calculation, but it causes serious bias. In this paper, we propose
a flexible fully decentralized actor-critic MARL framework, which can combine
most of actor-critic methods, and handle large-scale general cooperative
multi-agent setting. A primal-dual hybrid gradient descent type algorithm
framework is designed to learn individual agents separately for
decentralization. From the perspective of each agent, policy improvement and
value evaluation are jointly optimized, which can stabilize multi-agent policy
learning. Furthermore, our framework can achieve scalability and stability for
large-scale environment and reduce information transmission, by the parameter
sharing mechanism and a novel modeling-other-agents methods based on
theory-of-mind and online supervised learning. Sufficient experiments in
cooperative Multi-agent Particle Environment and StarCraft II show that our
decentralized MARL instantiation algorithms perform competitively against
conventional centralized and decentralized methods.
Related papers
- SACHA: Soft Actor-Critic with Heuristic-Based Attention for Partially
Observable Multi-Agent Path Finding [3.4260993997836753]
We propose a novel multi-agent actor-critic method called Soft Actor-Critic with Heuristic-Based Attention (SACHA)
SACHA learns a neural network for each agent to selectively pay attention to the shortest path guidance from multiple agents within its field of view.
We demonstrate decent improvements over several state-of-the-art learning-based MAPF methods with respect to success rate and solution quality.
arXiv Detail & Related papers (2023-07-05T23:36:33Z) - MADiff: Offline Multi-agent Learning with Diffusion Models [79.18130544233794]
Diffusion model (DM) recently achieved huge success in various scenarios including offline reinforcement learning.
We propose MADiff, a novel generative multi-agent learning framework to tackle this problem.
Our experiments show the superior performance of MADiff compared to baseline algorithms in a wide range of multi-agent learning tasks.
arXiv Detail & Related papers (2023-05-27T02:14:09Z) - Macro-Action-Based Multi-Agent/Robot Deep Reinforcement Learning under
Partial Observability [4.111899441919164]
State-of-the-art multi-agent reinforcement learning (MARL) methods have provided promising solutions to a variety of complex problems.
We first propose a group of value-based RL approaches for MacDec-POMDPs.
We formulate a set of macro-action-based policy gradient algorithms under the three training paradigms.
arXiv Detail & Related papers (2022-09-20T21:13:51Z) - Towards Global Optimality in Cooperative MARL with the Transformation
And Distillation Framework [26.612749327414335]
Decentralized execution is one core demand in cooperative multi-agent reinforcement learning (MARL)
In this paper, we theoretically analyze two common classes of algorithms with decentralized policies -- multi-agent policy gradient methods and value-decomposition methods.
We show that TAD-PPO can theoretically perform optimal policy learning in the finite multi-agent MDPs and shows significant outperformance on a large set of cooperative multi-agent tasks.
arXiv Detail & Related papers (2022-07-12T06:59:13Z) - Locality Matters: A Scalable Value Decomposition Approach for
Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.
We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z) - Mean-Field Multi-Agent Reinforcement Learning: A Decentralized Network
Approach [6.802025156985356]
This paper proposes a framework called localized training and decentralized execution to study MARL with network of states.
The key idea is to utilize the homogeneity of agents and regroup them according to their states, thus the formulation of a networked Markov decision process.
arXiv Detail & Related papers (2021-08-05T16:52:36Z) - Adaptive Stochastic ADMM for Decentralized Reinforcement Learning in
Edge Industrial IoT [106.83952081124195]
Reinforcement learning (RL) has been widely investigated and shown to be a promising solution for decision-making and optimal control processes.
We propose an adaptive ADMM (asI-ADMM) algorithm and apply it to decentralized RL with edge-computing-empowered IIoT networks.
Experiment results show that our proposed algorithms outperform the state of the art in terms of communication costs and scalability, and can well adapt to complex IoT environments.
arXiv Detail & Related papers (2021-06-30T16:49:07Z) - Lyapunov-Based Reinforcement Learning for Decentralized Multi-Agent
Control [3.3788926259119645]
In decentralized multi-agent control, systems are complex with unknown or highly uncertain dynamics.
Deep reinforcement learning (DRL) is promising to learn the controller/policy from data without the knowing system dynamics.
Existing multi-agent reinforcement learning (MARL) algorithms cannot ensure the closed-loop stability of a multi-agent system.
We propose a new MARL algorithm for decentralized multi-agent control with a stability guarantee.
arXiv Detail & Related papers (2020-09-20T06:11:42Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.