Related papers: Residual Q-Networks for Value Function Factorizing in Multi-Agent Reinforcement Learning

Residual Q-Networks for Value Function Factorizing in Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2205.15245v1
Date: Mon, 30 May 2022 16:56:06 GMT
Title: Residual Q-Networks for Value Function Factorizing in Multi-Agent Reinforcement Learning
Authors: Rafael Pina, Varuna De Silva, Joosep Hook, and Ahmet Kondoz
Abstract summary: We propose a novel concept of Residual Q-Networks (RQNs) for Multi-Agent Reinforcement Learning (MARL) The RQN learns to transform the individual Q-value trajectories in a way that preserves the Individual-Global-Max criteria (IGM) The proposed method converges faster, with increased stability and shows robust performance in a wider family of environments.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-Agent Reinforcement Learning (MARL) is useful in many problems that require the cooperation and coordination of multiple agents. Learning optimal policies using reinforcement learning in a multi-agent setting can be very difficult as the number of agents increases. Recent solutions such as Value Decomposition Networks (VDN), QMIX, QTRAN and QPLEX adhere to the centralized training and decentralized execution scheme and perform factorization of the joint action-value functions. However, these methods still suffer from increased environmental complexity, and at times fail to converge in a stable manner. We propose a novel concept of Residual Q-Networks (RQNs) for MARL, which learns to transform the individual Q-value trajectories in a way that preserves the Individual-Global-Max criteria (IGM), but is more robust in factorizing action-value functions. The RQN acts as an auxiliary network that accelerates convergence and will become obsolete as the agents reach the training objectives. The performance of the proposed method is compared against several state-of-the-art techniques such as QPLEX, QMIX, QTRAN and VDN, in a range of multi-agent cooperative tasks. The results illustrate that the proposed method, in general, converges faster, with increased stability and shows robust performance in a wider family of environments. The improvements in results are more prominent in environments with severe punishments for non-cooperative behaviours and especially in the absence of complete state information during training time.

Related papers

MO-MIX: Multi-Objective Multi-Agent Cooperative Decision-Making With Deep Reinforcement Learning [68.91090643731987]
Deep reinforcement learning (RL) has been applied extensively to solve complex decision-making problems.<n>Existing approaches are limited to separate fields and can only handle multi-agent decision-making with a single objective.<n>We propose MO-mix to solve the multi-objective multi-agent reinforcement learning (MOMARL) problem.
arXiv Detail & Related papers (2026-02-28T16:25:22Z)
Multi-Action Self-Improvement for Neural Combinatorial Optimization [0.979731979071071]
Self-improvement models iteratively refine their policies by generating and imitating high-quality solutions.<n>These approaches fail to exploit the structure of problems involving the coordination of multiple agents.<n>We extend self-improvement to operate over joint multi-agent actions.
arXiv Detail & Related papers (2025-10-14T08:26:27Z)
QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning? [4.429189958406034]
Credit assignment has remained a fundamental challenge in multi-agent reinforcement learning (MARL)<n>We propose a novel algorithm, textbfQLLM, which facilitates the automatic construction of credit assignment functions using large language models (LLMs)<n>Extensive experiments conducted on several standard MARL benchmarks demonstrate that the proposed method consistently outperforms existing state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-17T14:07:11Z)
SPECTra: Scalable Multi-Agent Reinforcement Learning with Permutation-Free Networks [3.7687375904925484]
In cooperative multi-agent reinforcement learning (MARL), the permutation problem where the state space grows exponentially with the number of agents reduces sample efficiency. We propose a novel agent network and a non-linear mixing network that ensure permutation-equivariance and scalability. Our approach achieves superior learning performance compared to existing methods.
arXiv Detail & Related papers (2025-03-14T04:26:51Z)
Enhancing Heterogeneous Multi-Agent Cooperation in Decentralized MARL via GNN-driven Intrinsic Rewards [1.179778723980276]
Multi-agent Reinforcement Learning (MARL) is emerging as a key framework for sequential decision-making and control tasks. The deployment of these systems in real-world scenarios often requires decentralized training, a diverse set of agents, and learning from infrequent environmental reward signals. We propose the CoHet algorithm, which utilizes a novel Graph Neural Network (GNN) based intrinsic motivation to facilitate the learning of heterogeneous agent policies.
arXiv Detail & Related papers (2024-08-12T21:38:40Z)
Inverse Factorized Q-Learning for Cooperative Multi-agent Imitation Learning [13.060023718506917]
imitation learning (IL) is a problem of learning to mimic expert behaviors from demonstrations in cooperative multi-agent systems. We introduce a novel multi-agent IL algorithm designed to address these challenges. Our approach enables the centralized learning by leveraging mixing networks to aggregate decentralized Q functions.
arXiv Detail & Related papers (2023-10-10T17:11:20Z)
The challenge of redundancy on multi-agent value factorisation [12.63182277116319]
In the field of cooperative multi-agent reinforcement learning (MARL), the standard paradigm is the use of centralised training and decentralised execution. We propose leveraging layerwise relevance propagation (LRP) to instead separate the learning of the joint value function and generation of local reward signals. We find that although the performance of both baselines VDN and Qmix degrades with the number of redundant agents, RDN is unaffected.
arXiv Detail & Related papers (2023-03-28T20:41:12Z)
Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents. We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z)
ROMAX: Certifiably Robust Deep Multiagent Reinforcement Learning via Convex Relaxation [32.091346776897744]
Cyber-physical attacks can challenge the robustness of multiagent reinforcement learning. We propose a minimax MARL approach to infer the worst-case policy update of other agents.
arXiv Detail & Related papers (2021-09-14T16:18:35Z)
Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC. We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z)
Softmax with Regularization: Better Value Estimation in Multi-Agent Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning. We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline. We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z)
QTRAN++: Improved Value Transformation for Cooperative Multi-Agent Reinforcement Learning [70.382101956278]
QTRAN is a reinforcement learning algorithm capable of learning the largest class of joint-action value functions. Despite its strong theoretical guarantee, it has shown poor empirical performance in complex environments. We propose a substantially improved version, coined QTRAN++.
arXiv Detail & Related papers (2020-06-22T05:08:36Z)
Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z)
FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC) It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.