Related papers: Credit Assignment and Efficient Exploration based on Influence Scope in Multi-agent Reinforcement Learning

Credit Assignment and Efficient Exploration based on Influence Scope in Multi-agent Reinforcement Learning

URL: http://arxiv.org/abs/2505.08630v1
Date: Tue, 13 May 2025 14:49:26 GMT
Title: Credit Assignment and Efficient Exploration based on Influence Scope in Multi-agent Reinforcement Learning
Authors: Shuai Han, Mehdi Dastani, Shihan Wang,
Abstract summary: Training cooperative agents in sparse-reward scenarios poses significant challenges for multi-agent reinforcement learning (MARL)<n>We propose an algorithm that calculates the Influence Scope of Agents (ISA) on states by taking specific value of the dimensions/attributes of states that can be influenced by individual agents.<n>The mutual dependence between agents' actions and state attributes are then used to calculate the credit assignment and to delimit the exploration space for each individual agent.
Score: 2.8111817372725785
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Training cooperative agents in sparse-reward scenarios poses significant challenges for multi-agent reinforcement learning (MARL). Without clear feedback on actions at each step in sparse-reward setting, previous methods struggle with precise credit assignment among agents and effective exploration. In this paper, we introduce a novel method to deal with both credit assignment and exploration problems in reward-sparse domains. Accordingly, we propose an algorithm that calculates the Influence Scope of Agents (ISA) on states by taking specific value of the dimensions/attributes of states that can be influenced by individual agents. The mutual dependence between agents' actions and state attributes are then used to calculate the credit assignment and to delimit the exploration space for each individual agent. We then evaluate ISA in a variety of sparse-reward multi-agent scenarios. The results show that our method significantly outperforms the state-of-art baselines.

Related papers

AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress [71.02263260394261]
Large language models (LLMs) still encounter challenges in multi-turn decision-making tasks.<n>We build process reward models (PRMs) to evaluate each decision and guide the agent's decision-making process.<n>AgentPRM captures both the interdependence between sequential decisions and their contribution to the final goal.
arXiv Detail & Related papers (2025-11-11T14:57:54Z)
Understanding Action Effects through Instrumental Empowerment in Multi-Agent Reinforcement Learning [39.74025439412935]
This work investigates whether meaningful insights into agent behaviors can be extracted solely by analyzing the policy distribution.<n>Inspired by the phenomenon that intelligent agents tend to pursue convergent instrumental values, we introduce Intended Cooperation Values (ICVs)<n>ICVs measure an agent's action effect on its teammates' policies by assessing their decision (un)certainty and preference alignment.
arXiv Detail & Related papers (2025-08-21T15:35:59Z)
Leveraging Large Language Models for Effective and Explainable Multi-Agent Credit Assignment [4.406086834602686]
We show how to reformulate credit assignment to the two pattern recognition problems of sequence improvement and attribution.<n>Our approach utilizes a centralized reward-critic which numerically decomposes the environment reward based on the individual contribution of each agent.<n>Both our methods far outperform the state-of-the-art on a variety of benchmarks, including Level-Based Foraging, Robotic Warehouse, and our new Spaceworld benchmark which incorporates collision-related safety constraints.
arXiv Detail & Related papers (2025-02-24T05:56:47Z)
From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training. Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z)
Quantifying Agent Interaction in Multi-agent Reinforcement Learning for Cost-efficient Generalization [63.554226552130054]
Generalization poses a significant challenge in Multi-agent Reinforcement Learning (MARL) The extent to which an agent is influenced by unseen co-players depends on the agent's policy and the specific scenario. We present the Level of Influence (LoI), a metric quantifying the interaction intensity among agents within a given scenario and environment.
arXiv Detail & Related papers (2023-10-11T06:09:26Z)
Deep Multi-Agent Reinforcement Learning for Decentralized Active Hypothesis Testing [11.639503711252663]
We tackle the multi-agent active hypothesis testing (AHT) problem by introducing a novel algorithm rooted in the framework of deep multi-agent reinforcement learning. We present a comprehensive set of experimental results that effectively showcase the agents' ability to learn collaborative strategies and enhance performance.
arXiv Detail & Related papers (2023-09-14T01:18:04Z)
ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency [65.28061634546577]
Multi-agent reinforcement learning (MARL) suffers from the non-stationarity problem. In this paper, we propose bidirectional action-dependent Q-learning (ACE) ACE outperforms the state-of-the-art algorithms on Google Research Football and StarCraft Multi-Agent Challenge.
arXiv Detail & Related papers (2022-11-29T10:22:55Z)
RACA: Relation-Aware Credit Assignment for Ad-Hoc Cooperation in Multi-Agent Deep Reinforcement Learning [55.55009081609396]
We propose a novel method, called Relation-Aware Credit Assignment (RACA), which achieves zero-shot generalization in ad-hoc cooperation scenarios. RACA takes advantage of a graph-based encoder relation to encode the topological structure between agents. Our method outperforms baseline methods on the StarCraftII micromanagement benchmark and ad-hoc cooperation scenarios.
arXiv Detail & Related papers (2022-06-02T03:39:27Z)
Learning Cooperative Multi-Agent Policies with Partial Reward Decoupling [13.915157044948364]
One of the preeminent obstacles to scaling multi-agent reinforcement learning is assigning credit to individual agents' actions. In this paper, we address this credit assignment problem with an approach that we call textitpartial reward decoupling (PRD) PRD decomposes large cooperative multi-agent RL problems into decoupled subproblems involving subsets of agents, thereby simplifying credit assignment.
arXiv Detail & Related papers (2021-12-23T17:48:04Z)
Episodic Multi-agent Reinforcement Learning with Curiosity-Driven Exploration [40.87053312548429]
We introduce a novel Episodic Multi-agent reinforcement learning with Curiosity-driven exploration, called EMC. We use prediction errors of individual Q-values as intrinsic rewards for coordinated exploration and utilize episodic memory to exploit explored informative experience to boost policy training.
arXiv Detail & Related papers (2021-11-22T07:34:47Z)
Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning [34.856522993714535]
We propose Shapley Counterfactual Credit Assignment, a novel method for explicit credit assignment which accounts for the coalition of agents. Our method outperforms existing cooperative MARL algorithms significantly and achieves the state-of-the-art, with especially large margins on tasks with more severe difficulties.
arXiv Detail & Related papers (2021-06-01T07:38:34Z)
Scalable Multi-Agent Inverse Reinforcement Learning via Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems. We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.