Decomposed Soft Actor-Critic Method for Cooperative Multi-Agent
Reinforcement Learning
- URL: http://arxiv.org/abs/2104.06655v1
- Date: Wed, 14 Apr 2021 07:02:40 GMT
- Title: Decomposed Soft Actor-Critic Method for Cooperative Multi-Agent
Reinforcement Learning
- Authors: Yuan Pu, Shaochen Wang, Rui Yang, Xin Yao, Bin Li
- Abstract summary: Experimental results demonstrate that mSAC significantly outperforms policy-based approach COMA.
In addition, mSAC achieves pretty good results on large action space tasks, such as 2c_vs_64zg and MMM2.
- Score: 10.64928897082273
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep reinforcement learning methods have shown great performance on many
challenging cooperative multi-agent tasks. Two main promising research
directions are multi-agent value function decomposition and multi-agent policy
gradients. In this paper, we propose a new decomposed multi-agent soft
actor-critic (mSAC) method, which incorporates the idea of the multi-agent
value function decomposition and soft policy iteration framework effectively
and is a combination of novel and existing techniques, including decomposed Q
value network architecture, decentralized probabilistic policy, and
counterfactual advantage function (optional). Theoretically, mSAC supports
efficient off-policy learning and addresses credit assignment problem partially
in both discrete and continuous action spaces. Tested on StarCraft II
micromanagement cooperative multiagent benchmark, we empirically investigate
the performance of mSAC against its variants and analyze the effects of the
different components. Experimental results demonstrate that mSAC significantly
outperforms policy-based approach COMA, and achieves competitive results with
SOTA value-based approach Qmix on most tasks in terms of asymptotic perfomance
metric. In addition, mSAC achieves pretty good results on large action space
tasks, such as 2c_vs_64zg and MMM2.
Related papers
- UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.
We focus on the case of linear utility functions parameterised by weight vectors w.
We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z) - Efficient Multi-Task Reinforcement Learning via Task-Specific Action Correction [10.388605128396678]
Task-Specific Action Correction is designed for simultaneous learning of multiple tasks.
ACP incorporates goal-oriented sparse rewards, enabling an agent to adopt a long-term perspective.
Additional rewards transform the original problem into a multi-objective MTRL problem.
arXiv Detail & Related papers (2024-04-09T02:11:35Z) - Effective Multi-Agent Deep Reinforcement Learning Control with Relative
Entropy Regularization [6.441951360534903]
Multi-Agent Continuous Dynamic Policy Gradient (MACDPP) was proposed to tackle the issues of limited capability and sample efficiency in various scenarios controlled by multiple agents.
It alleviates the inconsistency of multiple agents' policy updates by introducing the relative entropy regularization to the Training with Decentralized Execution (CTDE) framework with the Actor-Critic (AC) structure.
arXiv Detail & Related papers (2023-09-26T07:38:19Z) - Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent
RL [107.58821842920393]
We quantify the agent's behavior difference and build its relationship with the policy performance via bf Role Diversity
We find that the error bound in MARL can be decomposed into three parts that have a strong relation to the role diversity.
The decomposed factors can significantly impact policy optimization on three popular directions.
arXiv Detail & Related papers (2022-06-01T04:58:52Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Semi-On-Policy Training for Sample Efficient Multi-Agent Policy
Gradients [51.749831824106046]
We introduce semi-on-policy (SOP) training as an effective and computationally efficient way to address the sample inefficiency of on-policy policy gradient methods.
We show that our methods perform as well or better than state-of-the-art value-based methods on a variety of SMAC tasks.
arXiv Detail & Related papers (2021-04-27T19:37:01Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - Modeling the Interaction between Agents in Cooperative Multi-Agent
Reinforcement Learning [2.9360071145551068]
We propose a novel cooperative MARL algorithm named as interactive actor-critic(IAC)
IAC models the interaction of agents from perspectives of policy and value function.
We extend the value decomposition methods to continuous control tasks and evaluate IAC on benchmark tasks including classic control and multi-agent particle environments.
arXiv Detail & Related papers (2021-02-10T01:58:28Z) - Off-Policy Multi-Agent Decomposed Policy Gradients [30.389041305278045]
We investigate causes that hinder the performance of MAPG algorithms and present a multi-agent decomposed policy gradient method (DOP)
DOP supports efficient off-policy learning and addresses the issue of centralized-decentralized mismatch and credit assignment.
In addition, empirical evaluations on the StarCraft II micromanagement benchmark and multi-agent particle environments demonstrate that DOP significantly outperforms both state-of-the-art value-based and policy-based multi-agent reinforcement learning algorithms.
arXiv Detail & Related papers (2020-07-24T02:21:55Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.