Divergence-Regularized Multi-Agent Actor-Critic
- URL: http://arxiv.org/abs/2110.00304v1
- Date: Fri, 1 Oct 2021 10:27:42 GMT
- Title: Divergence-Regularized Multi-Agent Actor-Critic
- Authors: Kefan Su and Zongqing Lu
- Abstract summary: We propose a novel off-policy cooperative MARL framework, divergence-regularized multi-agent actor-critic (DMAC)
DMAC is a flexible framework and can be combined with many existing MARL algorithms.
We empirically show that DMAC substantially improves the performance of existing MARL algorithms.
- Score: 17.995905582226467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Entropy regularization is a popular method in reinforcement learning (RL).
Although it has many advantages, it alters the RL objective and makes the
converged policy deviate from the optimal policy of the original Markov
Decision Process. Though divergence regularization has been proposed to settle
this problem, it cannot be trivially applied to cooperative multi-agent
reinforcement learning (MARL). In this paper, we investigate divergence
regularization in cooperative MARL and propose a novel off-policy cooperative
MARL framework, divergence-regularized multi-agent actor-critic (DMAC).
Mathematically, we derive the update rule of DMAC which is naturally
off-policy, guarantees a monotonic policy improvement and is not biased by the
regularization. DMAC is a flexible framework and can be combined with many
existing MARL algorithms. We evaluate DMAC in a didactic stochastic game and
StarCraft Multi-Agent Challenge and empirically show that DMAC substantially
improves the performance of existing MARL algorithms.
Related papers
- AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline
Multi-Agent RL via Alternating Stationary Distribution Correction Estimation [65.4532392602682]
One of the main challenges in offline Reinforcement Learning (RL) is the distribution shift that arises from the learned policy deviating from the data collection policy.
This is often addressed by avoiding out-of-distribution (OOD) actions during policy improvement as their presence can lead to substantial performance degradation.
We introduce AlberDICE, an offline MARL algorithm that performs centralized training of individual agents based on stationary distribution optimization.
arXiv Detail & Related papers (2023-11-03T18:56:48Z) - Robust Multi-Agent Reinforcement Learning via Adversarial
Regularization: Theoretical Foundation and Stable Algorithms [79.61176746380718]
Multi-Agent Reinforcement Learning (MARL) has shown promising results across several domains.
MARL policies often lack robustness and are sensitive to small changes in their environment.
We show that we can gain robustness by controlling a policy's Lipschitz constant.
We propose a new robust MARL framework, ERNIE, that promotes the Lipschitz continuity of the policies.
arXiv Detail & Related papers (2023-10-16T20:14:06Z) - Heterogeneous Multi-Agent Reinforcement Learning via Mirror Descent
Policy Optimization [1.5501208213584152]
This paper presents an extension of the Mirror Descent method to overcome challenges in cooperative Multi-Agent Reinforcement Learning (MARL) settings.
The proposed Heterogeneous-Agent Mirror Descent Policy Optimization (HAMDPO) algorithm utilizes the multi-agent advantage decomposition lemma.
We evaluate HAMDPO on Multi-Agent MuJoCo and StarCraftII tasks, demonstrating its superiority over state-of-the-art algorithms.
arXiv Detail & Related papers (2023-08-13T10:18:10Z) - Faster Last-iterate Convergence of Policy Optimization in Zero-Sum
Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games.
We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method.
Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z) - Off-Policy Correction For Multi-Agent Reinforcement Learning [9.599347559588216]
Multi-agent reinforcement learning (MARL) provides a framework for problems involving multiple interacting agents.
Despite apparent similarity to the single-agent case, multi-agent problems are often harder to train and analyze theoretically.
We propose MA-Trace, a new on-policy actor-critic algorithm, which extends V-Trace to the MARL setting.
arXiv Detail & Related papers (2021-11-22T14:23:13Z) - Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning [25.027143431992755]
Trust region methods rigorously enabled reinforcement learning (RL) agents to learn monotonically improving policies, leading to superior performance on a variety of tasks.
Unfortunately, when it comes to multi-agent reinforcement learning (MARL), the property of monotonic improvement may not simply apply.
In this paper, we extend the theory of trust region learning to MARL. Central to our findings are the multi-agent advantage decomposition lemma and the sequential policy update scheme.
Based on these, we develop Heterogeneous-Agent Trust Region Policy optimisation (HATPRO) and Heterogeneous-Agent Proximal Policy optimisation (
arXiv Detail & Related papers (2021-09-23T09:44:35Z) - Permutation Invariant Policy Optimization for Mean-Field Multi-Agent
Reinforcement Learning: A Principled Approach [128.62787284435007]
We propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.
We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence.
In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outperform existing competitors.
arXiv Detail & Related papers (2021-05-18T04:35:41Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.