Related papers: Leveraging Large Language Models for Effective and Explainable Multi-Agent Credit Assignment

Leveraging Large Language Models for Effective and Explainable Multi-Agent Credit Assignment

URL: http://arxiv.org/abs/2502.16863v1
Date: Mon, 24 Feb 2025 05:56:47 GMT
Title: Leveraging Large Language Models for Effective and Explainable Multi-Agent Credit Assignment
Authors: Kartik Nagpal, Dayi Dong, Jean-Baptiste Bouvier, Negar Mehr,
Abstract summary: We show how to reformulate credit assignment to the two pattern recognition problems of sequence improvement and attribution.<n>Our approach utilizes a centralized reward-critic which numerically decomposes the environment reward based on the individual contribution of each agent.<n>Both our methods far outperform the state-of-the-art on a variety of benchmarks, including Level-Based Foraging, Robotic Warehouse, and our new Spaceworld benchmark which incorporates collision-related safety constraints.
Score: 4.406086834602686
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent work, spanning from autonomous vehicle coordination to in-space assembly, has shown the importance of learning collaborative behavior for enabling robots to achieve shared goals. A common approach for learning this cooperative behavior is to utilize the centralized-training decentralized-execution paradigm. However, this approach also introduces a new challenge: how do we evaluate the contributions of each agent's actions to the overall success or failure of the team. This credit assignment problem has remained open, and has been extensively studied in the Multi-Agent Reinforcement Learning literature. In fact, humans manually inspecting agent behavior often generate better credit evaluations than existing methods. We combine this observation with recent works which show Large Language Models demonstrate human-level performance at many pattern recognition tasks. Our key idea is to reformulate credit assignment to the two pattern recognition problems of sequence improvement and attribution, which motivates our novel LLM-MCA method. Our approach utilizes a centralized LLM reward-critic which numerically decomposes the environment reward based on the individualized contribution of each agent in the scenario. We then update the agents' policy networks based on this feedback. We also propose an extension LLM-TACA where our LLM critic performs explicit task assignment by passing an intermediary goal directly to each agent policy in the scenario. Both our methods far outperform the state-of-the-art on a variety of benchmarks, including Level-Based Foraging, Robotic Warehouse, and our new Spaceworld benchmark which incorporates collision-related safety constraints. As an artifact of our methods, we generate large trajectory datasets with each timestep annotated with per-agent reward information, as sampled from our LLM critics.

Related papers

AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress [71.02263260394261]
Large language models (LLMs) still encounter challenges in multi-turn decision-making tasks.<n>We build process reward models (PRMs) to evaluate each decision and guide the agent's decision-making process.<n>AgentPRM captures both the interdependence between sequential decisions and their contribution to the final goal.
arXiv Detail & Related papers (2025-11-11T14:57:54Z)
Agentic Reinforcement Learning with Implicit Step Rewards [92.26560379363492]
Large language models (LLMs) are increasingly developed as autonomous agents using reinforcement learning (agentic RL)<n>We introduce implicit step rewards for agentic RL (iStar), a general credit-assignment strategy that integrates seamlessly with standard RL algorithms.<n>We evaluate our method on three challenging agent benchmarks, including WebShop and VisualSokoban, as well as open-ended social interactions with unverifiable rewards in SOTOPIA.
arXiv Detail & Related papers (2025-09-23T16:15:42Z)
Application of LLM Guided Reinforcement Learning in Formation Control with Collision Avoidance [1.1718316049475228]
Multi-Agent Systems (MAS) excel at accomplishing complex objectives through the collaborative efforts of individual agents.<n>In this paper, we introduce a novel framework that aims to overcome the challenge of designing an effective reward function.<n>By giving large language models (LLMs) on the prioritization of tasks, our framework generates reward functions that can be dynamically adjusted online.
arXiv Detail & Related papers (2025-07-22T09:26:00Z)
Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models. Controlled Decoding provides a mechanism for aligning a model at inference time without retraining. We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z)
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents [59.825725526176655]
Large Language Models (LLMs) have shown remarkable capabilities as autonomous agents. Existing benchmarks either focus on single-agent tasks or are confined to narrow domains, failing to capture the dynamics of multi-agent coordination and competition. We introduce MultiAgentBench, a benchmark designed to evaluate LLM-based multi-agent systems across diverse, interactive scenarios.
arXiv Detail & Related papers (2025-03-03T05:18:50Z)
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents [17.773801766612703]
Large language models (LLMs) based agent systems have made great strides in real-world applications beyond traditional NLP tasks. This paper proposes a new benchmark, Collab-Overcooked, built on the popular Overcooked-AI game with more applicable and challenging tasks in interactive environments.
arXiv Detail & Related papers (2025-02-27T13:31:13Z)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.<n>However, they still struggle with problems requiring multi-step decision-making and environmental feedback.<n>We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
MALT: Improving Reasoning with Multi-Agent LLM Training [64.13803241218886]
We present a first step toward "Multi-agent LLM training" (MALT) on reasoning problems.<n>Our approach employs a sequential multi-agent setup with heterogeneous LLMs assigned specialized roles.<n>We evaluate our approach across MATH, GSM8k, and CQA, where MALT on Llama 3.1 8B models achieves relative improvements of 14.14%, 7.12%, and 9.40% respectively.
arXiv Detail & Related papers (2024-12-02T19:30:36Z)
From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
Deep Multi-Agent Reinforcement Learning for Decentralized Active Hypothesis Testing [11.639503711252663]
We tackle the multi-agent active hypothesis testing (AHT) problem by introducing a novel algorithm rooted in the framework of deep multi-agent reinforcement learning. We present a comprehensive set of experimental results that effectively showcase the agents' ability to learn collaborative strategies and enhance performance.
arXiv Detail & Related papers (2023-09-14T01:18:04Z)
Learning From Good Trajectories in Offline Multi-Agent Reinforcement Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets. One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team. We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z)
RACA: Relation-Aware Credit Assignment for Ad-Hoc Cooperation in Multi-Agent Deep Reinforcement Learning [55.55009081609396]
We propose a novel method, called Relation-Aware Credit Assignment (RACA), which achieves zero-shot generalization in ad-hoc cooperation scenarios. RACA takes advantage of a graph-based encoder relation to encode the topological structure between agents. Our method outperforms baseline methods on the StarCraftII micromanagement benchmark and ad-hoc cooperation scenarios.
arXiv Detail & Related papers (2022-06-02T03:39:27Z)
Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents. We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z)
Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning [34.856522993714535]
We propose Shapley Counterfactual Credit Assignment, a novel method for explicit credit assignment which accounts for the coalition of agents. Our method outperforms existing cooperative MARL algorithms significantly and achieves the state-of-the-art, with especially large margins on tasks with more severe difficulties.
arXiv Detail & Related papers (2021-06-01T07:38:34Z)
AoI-Aware Resource Allocation for Platoon-Based C-V2X Networks via Multi-Agent Multi-Task Reinforcement Learning [22.890835786710316]
This paper investigates the problem of age of information (AoI) aware radio resource management for a platooning system. Multiple autonomous platoons exploit the cellular wireless vehicle-to-everything (C-V2X) communication technology to disseminate the cooperative awareness messages (CAMs) to their followers. We exploit a distributed resource allocation framework based on multi-agent reinforcement learning (MARL), where each platoon leader (PL) acts as an agent and interacts with the environment to learn its optimal policy.
arXiv Detail & Related papers (2021-05-10T08:39:56Z)
Modeling the Interaction between Agents in Cooperative Multi-Agent Reinforcement Learning [2.9360071145551068]
We propose a novel cooperative MARL algorithm named as interactive actor-critic(IAC) IAC models the interaction of agents from perspectives of policy and value function. We extend the value decomposition methods to continuous control tasks and evaluate IAC on benchmark tasks including classic control and multi-agent particle environments.
arXiv Detail & Related papers (2021-02-10T01:58:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.