Explicit Credit Assignment through Local Rewards and Dependence Graphs in Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2601.21523v1
- Date: Thu, 29 Jan 2026 10:38:19 GMT
- Title: Explicit Credit Assignment through Local Rewards and Dependence Graphs in Multi-Agent Reinforcement Learning
- Authors: Bang Giang Le, Viet Cuong Ta,
- Abstract summary: We propose a method that combines the merits of two approaches to cooperative learning.<n>By using a graph of interaction between agents, our method discerns the individual agent contribution in a more fine-grained manner than a global reward.<n>Our experiments demonstrate the flexibility of the approach, enabling improvements over the traditional local and global reward settings.
- Score: 5.8010446129208155
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To promote cooperation in Multi-Agent Reinforcement Learning, the reward signals of all agents can be aggregated together, forming global rewards that are commonly known as the fully cooperative setting. However, global rewards are usually noisy because they contain the contributions of all agents, which have to be resolved in the credit assignment process. On the other hand, using local reward benefits from faster learning due to the separation of agents' contributions, but can be suboptimal as agents myopically optimize their own reward while disregarding the global optimality. In this work, we propose a method that combines the merits of both approaches. By using a graph of interaction between agents, our method discerns the individual agent contribution in a more fine-grained manner than a global reward, while alleviating the cooperation problem with agents' local reward. We also introduce a practical approach for approximating such a graph. Our experiments demonstrate the flexibility of the approach, enabling improvements over the traditional local and global reward settings.
Related papers
- Agentic Reinforcement Learning with Implicit Step Rewards [92.26560379363492]
Large language models (LLMs) are increasingly developed as autonomous agents using reinforcement learning (agentic RL)<n>We introduce implicit step rewards for agentic RL (iStar), a general credit-assignment strategy that integrates seamlessly with standard RL algorithms.<n>We evaluate our method on three challenging agent benchmarks, including WebShop and VisualSokoban, as well as open-ended social interactions with unverifiable rewards in SOTOPIA.
arXiv Detail & Related papers (2025-09-23T16:15:42Z) - Agent-Temporal Credit Assignment for Optimal Policy Preservation in Sparse Multi-Agent Reinforcement Learning [14.003793644193605]
In multi-agent environments, agents often struggle to learn optimal policies due to sparse or delayed global rewards.<n>We introduce Temporal-Agent Reward Redistribution (TAR$2$), a novel approach designed to address the agent-temporal credit assignment problem.<n> TAR$2$ decomposes sparse global rewards into time-step-specific rewards and calculates agent-specific contributions to these rewards.
arXiv Detail & Related papers (2024-12-19T12:05:13Z) - Asynchronous Message-Passing and Zeroth-Order Optimization Based Distributed Learning with a Use-Case in Resource Allocation in Communication Networks [11.182443036683225]
Distributed learning and adaptation have received significant interest and found wide-ranging applications in machine learning signal processing.<n>This paper specifically focuses on a scenario where agents collaborate towards a common task.<n>Agents, acting as transmitters, collaboratively train their individual policies to maximize a global reward.
arXiv Detail & Related papers (2023-11-08T11:12:27Z) - AgentVerse: Facilitating Multi-Agent Collaboration and Exploring
Emergent Behaviors [93.38830440346783]
We propose a multi-agent framework framework that can collaboratively adjust its composition as a greater-than-the-sum-of-its-parts system.
Our experiments demonstrate that framework framework can effectively deploy multi-agent groups that outperform a single agent.
In view of these behaviors, we discuss some possible strategies to leverage positive ones and mitigate negative ones for improving the collaborative potential of multi-agent groups.
arXiv Detail & Related papers (2023-08-21T16:47:11Z) - Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL)
It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks.
The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z) - Locality Matters: A Scalable Value Decomposition Approach for
Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.
We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z) - AoI-Aware Resource Allocation for Platoon-Based C-V2X Networks via
Multi-Agent Multi-Task Reinforcement Learning [22.890835786710316]
This paper investigates the problem of age of information (AoI) aware radio resource management for a platooning system.
Multiple autonomous platoons exploit the cellular wireless vehicle-to-everything (C-V2X) communication technology to disseminate the cooperative awareness messages (CAMs) to their followers.
We exploit a distributed resource allocation framework based on multi-agent reinforcement learning (MARL), where each platoon leader (PL) acts as an agent and interacts with the environment to learn its optimal policy.
arXiv Detail & Related papers (2021-05-10T08:39:56Z) - Multi-agent Policy Optimization with Approximatively Synchronous
Advantage Estimation [55.96893934962757]
In multi-agent system, polices of different agents need to be evaluated jointly.
In current methods, value functions or advantage functions use counter-factual joint actions which are evaluated asynchronously.
In this work, we propose the approximatively synchronous advantage estimation.
arXiv Detail & Related papers (2020-12-07T07:29:19Z) - Cooperative Heterogeneous Deep Reinforcement Learning [47.97582814287474]
We present a Cooperative Heterogeneous Deep Reinforcement Learning framework that can learn a policy by integrating the advantages of heterogeneous agents.
Global agents are off-policy agents that can utilize experiences from the other agents.
Local agents are either on-policy agents or population-based evolutionary (EAs) agents that can explore the local area effectively.
arXiv Detail & Related papers (2020-11-02T07:39:09Z) - Learning to Incentivize Other Learning Agents [73.03133692589532]
We show how to equip RL agents with the ability to give rewards directly to other agents, using a learned incentive function.
Such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games.
Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
arXiv Detail & Related papers (2020-06-10T20:12:38Z) - Reward Design in Cooperative Multi-agent Reinforcement Learning for
Packet Routing [8.021402935358488]
We study reward design problem in cooperative multi-agent reinforcement learning (MARL) based on packet routing environments.
We show that the above two reward signals are prone to produce suboptimal policies.
We design some mixed reward signals, which are off-the-shelf to learn better policies.
arXiv Detail & Related papers (2020-03-05T02:27:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.