Residual Q-Networks for Value Function Factorizing in Multi-Agent
Reinforcement Learning
- URL: http://arxiv.org/abs/2205.15245v1
- Date: Mon, 30 May 2022 16:56:06 GMT
- Title: Residual Q-Networks for Value Function Factorizing in Multi-Agent
Reinforcement Learning
- Authors: Rafael Pina, Varuna De Silva, Joosep Hook, and Ahmet Kondoz
- Abstract summary: We propose a novel concept of Residual Q-Networks (RQNs) for Multi-Agent Reinforcement Learning (MARL)
The RQN learns to transform the individual Q-value trajectories in a way that preserves the Individual-Global-Max criteria (IGM)
The proposed method converges faster, with increased stability and shows robust performance in a wider family of environments.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-Agent Reinforcement Learning (MARL) is useful in many problems that
require the cooperation and coordination of multiple agents. Learning optimal
policies using reinforcement learning in a multi-agent setting can be very
difficult as the number of agents increases. Recent solutions such as Value
Decomposition Networks (VDN), QMIX, QTRAN and QPLEX adhere to the centralized
training and decentralized execution scheme and perform factorization of the
joint action-value functions. However, these methods still suffer from
increased environmental complexity, and at times fail to converge in a stable
manner. We propose a novel concept of Residual Q-Networks (RQNs) for MARL,
which learns to transform the individual Q-value trajectories in a way that
preserves the Individual-Global-Max criteria (IGM), but is more robust in
factorizing action-value functions. The RQN acts as an auxiliary network that
accelerates convergence and will become obsolete as the agents reach the
training objectives. The performance of the proposed method is compared against
several state-of-the-art techniques such as QPLEX, QMIX, QTRAN and VDN, in a
range of multi-agent cooperative tasks. The results illustrate that the
proposed method, in general, converges faster, with increased stability and
shows robust performance in a wider family of environments. The improvements in
results are more prominent in environments with severe punishments for
non-cooperative behaviours and especially in the absence of complete state
information during training time.
Related papers
- Enhancing Heterogeneous Multi-Agent Cooperation in Decentralized MARL via GNN-driven Intrinsic Rewards [1.179778723980276]
Multi-agent Reinforcement Learning (MARL) is emerging as a key framework for sequential decision-making and control tasks.
The deployment of these systems in real-world scenarios often requires decentralized training, a diverse set of agents, and learning from infrequent environmental reward signals.
We propose the CoHet algorithm, which utilizes a novel Graph Neural Network (GNN) based intrinsic motivation to facilitate the learning of heterogeneous agent policies.
arXiv Detail & Related papers (2024-08-12T21:38:40Z) - Inverse Factorized Q-Learning for Cooperative Multi-agent Imitation
Learning [13.060023718506917]
imitation learning (IL) is a problem of learning to mimic expert behaviors from demonstrations in cooperative multi-agent systems.
We introduce a novel multi-agent IL algorithm designed to address these challenges.
Our approach enables the centralized learning by leveraging mixing networks to aggregate decentralized Q functions.
arXiv Detail & Related papers (2023-10-10T17:11:20Z) - The challenge of redundancy on multi-agent value factorisation [12.63182277116319]
In the field of cooperative multi-agent reinforcement learning (MARL), the standard paradigm is the use of centralised training and decentralised execution.
We propose leveraging layerwise relevance propagation (LRP) to instead separate the learning of the joint value function and generation of local reward signals.
We find that although the performance of both baselines VDN and Qmix degrades with the number of redundant agents, RDN is unaffected.
arXiv Detail & Related papers (2023-03-28T20:41:12Z) - Locality Matters: A Scalable Value Decomposition Approach for
Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.
We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z) - ROMAX: Certifiably Robust Deep Multiagent Reinforcement Learning via
Convex Relaxation [32.091346776897744]
Cyber-physical attacks can challenge the robustness of multiagent reinforcement learning.
We propose a minimax MARL approach to infer the worst-case policy update of other agents.
arXiv Detail & Related papers (2021-09-14T16:18:35Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - QTRAN++: Improved Value Transformation for Cooperative Multi-Agent
Reinforcement Learning [70.382101956278]
QTRAN is a reinforcement learning algorithm capable of learning the largest class of joint-action value functions.
Despite its strong theoretical guarantee, it has shown poor empirical performance in complex environments.
We propose a substantially improved version, coined QTRAN++.
arXiv Detail & Related papers (2020-06-22T05:08:36Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.