Learning Implicit Credit Assignment for Cooperative Multi-Agent
Reinforcement Learning
- URL: http://arxiv.org/abs/2007.02529v2
- Date: Thu, 22 Oct 2020 14:18:50 GMT
- Title: Learning Implicit Credit Assignment for Cooperative Multi-Agent
Reinforcement Learning
- Authors: Meng Zhou, Ziyu Liu, Pengwei Sui, Yixuan Li, Yuk Ying Chung
- Abstract summary: We present a multi-agent actor-critic method that aims to implicitly address the credit assignment problem under fully cooperative settings.
Our key motivation is that credit assignment among agents may not require an explicit formulation as long as the policy gradients from a centralized critic carry sufficient information for the decentralized agents to maximize their joint action value.
Our algorithm, referred to as LICA, is evaluated on several benchmarks including the multi-agent particle environments and a set of challenging Star II micromanagement tasks.
- Score: 31.147638213056872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a multi-agent actor-critic method that aims to implicitly address
the credit assignment problem under fully cooperative settings. Our key
motivation is that credit assignment among agents may not require an explicit
formulation as long as (1) the policy gradients derived from a centralized
critic carry sufficient information for the decentralized agents to maximize
their joint action value through optimal cooperation and (2) a sustained level
of exploration is enforced throughout training. Under the centralized training
with decentralized execution (CTDE) paradigm, we achieve the former by
formulating the centralized critic as a hypernetwork such that a latent state
representation is integrated into the policy gradients through its
multiplicative association with the stochastic policies; to achieve the latter,
we derive a simple technique called adaptive entropy regularization where
magnitudes of the entropy gradients are dynamically rescaled based on the
current policy stochasticity to encourage consistent levels of exploration. Our
algorithm, referred to as LICA, is evaluated on several benchmarks including
the multi-agent particle environments and a set of challenging StarCraft II
micromanagement tasks, and we show that LICA significantly outperforms previous
methods.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - ROMA-iQSS: An Objective Alignment Approach via State-Based Value Learning and ROund-Robin Multi-Agent Scheduling [44.276285521929424]
We introduce a decentralized state-based value learning algorithm that enables agents to independently discover optimal states.
Our theoretical analysis shows that our approach leads decentralized agents to an optimal collective policy.
Empirical experiments further demonstrate that our method outperforms existing decentralized state-based and action-based value learning strategies.
arXiv Detail & Related papers (2024-04-05T09:39:47Z) - Decentralized Policy Optimization [21.59254848913971]
We propose textitdecentralized policy optimization (DPO), a decentralized actor-critic algorithm with monotonic improvement and convergence guarantee.
Empirically, we compare DPO with IPPO in a variety of cooperative multi-agent tasks, covering discrete and continuous action spaces, and fully and partially observable environments.
arXiv Detail & Related papers (2022-11-06T05:38:23Z) - Communication-Efficient Actor-Critic Methods for Homogeneous Markov
Games [6.589813623221242]
Policy sharing is crucial to efficient learning in certain tasks yet lacks theoretical justification.
We develop the first consensus-based decentralized actor-critic method.
We also develop practical algorithms based on our decentralized actor-critic method to reduce the communication cost during training.
arXiv Detail & Related papers (2022-02-18T20:35:00Z) - Monotonic Improvement Guarantees under Non-stationarity for
Decentralized PPO [66.5384483339413]
We present a new monotonic improvement guarantee for optimizing decentralized policies in cooperative Multi-Agent Reinforcement Learning (MARL)
We show that a trust region constraint can be effectively enforced in a principled way by bounding independent ratios based on the number of agents in training.
arXiv Detail & Related papers (2022-01-31T20:39:48Z) - Iterated Reasoning with Mutual Information in Cooperative and Byzantine
Decentralized Teaming [0.0]
We show that reformulating an agent's policy to be conditional on the policies of its teammates inherently maximizes Mutual Information (MI) lower-bound when optimizing under Policy Gradient (PG)
Our approach, InfoPG, outperforms baselines in learning emergent collaborative behaviors and sets the state-of-the-art in decentralized cooperative MARL tasks.
arXiv Detail & Related papers (2022-01-20T22:54:32Z) - Local Advantage Actor-Critic for Robust Multi-Agent Deep Reinforcement
Learning [19.519440854957633]
We propose a new multi-agent policy gradient method called Robust Local Advantage (ROLA) Actor-Critic.
ROLA allows each agent to learn an individual action-value function as a local critic as well as ameliorating environment non-stationarity.
We show ROLA's robustness and effectiveness over a number of state-of-the-art multi-agent policy gradient algorithms.
arXiv Detail & Related papers (2021-10-16T19:03:34Z) - A Deep Reinforcement Learning Approach to Marginalized Importance
Sampling with the Successor Representation [61.740187363451746]
Marginalized importance sampling (MIS) measures the density ratio between the state-action occupancy of a target policy and that of a sampling distribution.
We bridge the gap between MIS and deep reinforcement learning by observing that the density ratio can be computed from the successor representation of the target policy.
We evaluate the empirical performance of our approach on a variety of challenging Atari and MuJoCo environments.
arXiv Detail & Related papers (2021-06-12T20:21:38Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.