Credit Assignment with Meta-Policy Gradient for Multi-Agent
Reinforcement Learning
- URL: http://arxiv.org/abs/2102.12957v1
- Date: Wed, 24 Feb 2021 12:03:37 GMT
- Title: Credit Assignment with Meta-Policy Gradient for Multi-Agent
Reinforcement Learning
- Authors: Jianzhun Shao, Hongchang Zhang, Yuhang Jiang, Shuncheng He, Xiangyang
Ji
- Abstract summary: We propose a general meta-learning-based Mixing Network with Meta Policy Gradient(MNMPG) framework to distill the global hierarchy for delicate reward decomposition.
Experiments on the StarCraft II micromanagement benchmark demonstrate that our method just with a simple utility network is able to outperform the current state-of-the-art MARL algorithms.
- Score: 29.895142928565228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reward decomposition is a critical problem in centralized training with
decentralized execution~(CTDE) paradigm for multi-agent reinforcement learning.
To take full advantage of global information, which exploits the states from
all agents and the related environment for decomposing Q values into individual
credits, we propose a general meta-learning-based Mixing Network with Meta
Policy Gradient~(MNMPG) framework to distill the global hierarchy for delicate
reward decomposition. The excitation signal for learning global hierarchy is
deduced from the episode reward difference between before and after "exercise
updates" through the utility network. Our method is generally applicable to the
CTDE method using a monotonic mixing network. Experiments on the StarCraft II
micromanagement benchmark demonstrate that our method just with a simple
utility network is able to outperform the current state-of-the-art MARL
algorithms on 4 of 5 super hard scenarios. Better performance can be further
achieved when combined with a role-based utility network.
Related papers
- USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text
Retrieval [115.28586222748478]
Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to the given query from the other modality.
Existing approaches typically suffer from two major limitations.
arXiv Detail & Related papers (2023-01-17T12:42:58Z) - Hypernetworks in Meta-Reinforcement Learning [47.25270748922176]
Multi-task reinforcement learning (RL) and meta-RL aim to improve sample efficiency by generalizing over a distribution of related tasks.
State of the art methods often fail to outperform a degenerate solution that simply learns each task separately.
Hypernetworks are a promising path forward since they replicate the separate policies of the degenerate solution and are applicable to meta-RL.
arXiv Detail & Related papers (2022-10-20T15:34:52Z) - Locality Matters: A Scalable Value Decomposition Approach for
Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.
We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z) - Globally Convergent Multilevel Training of Deep Residual Networks [0.0]
We propose a globally convergent multilevel training method for deep residual networks (ResNets)
The devised method operates in hybrid (stochastic-deterministic) settings by adaptively adjusting mini-batch sizes during the training.
arXiv Detail & Related papers (2021-07-15T19:08:58Z) - Energy-Efficient and Federated Meta-Learning via Projected Stochastic
Gradient Ascent [79.58680275615752]
We propose an energy-efficient federated meta-learning framework.
We assume each task is owned by a separate agent, so a limited number of tasks is used to train a meta-model.
arXiv Detail & Related papers (2021-05-31T08:15:44Z) - MetaGater: Fast Learning of Conditional Channel Gated Networks via
Federated Meta-Learning [46.79356071007187]
We propose a holistic approach to jointly train the backbone network and the channel gating.
We develop a federated meta-learning approach to jointly learn good meta-initializations for both backbone networks and gating modules.
arXiv Detail & Related papers (2020-11-25T04:26:23Z) - Dif-MAML: Decentralized Multi-Agent Meta-Learning [54.39661018886268]
We propose a cooperative multi-agent meta-learning algorithm, referred to as MAML or Dif-MAML.
We show that the proposed strategy allows a collection of agents to attain agreement at a linear rate and to converge to a stationary point of the aggregate MAML.
Simulation results illustrate the theoretical findings and the superior performance relative to the traditional non-cooperative setting.
arXiv Detail & Related papers (2020-10-06T16:51:09Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.