LLM-Mediated Guidance of MARL Systems
- URL: http://arxiv.org/abs/2503.13553v1
- Date: Sun, 16 Mar 2025 20:16:13 GMT
- Title: LLM-Mediated Guidance of MARL Systems
- Authors: Philipp D. Siedler, Ian Gemp,
- Abstract summary: In complex multi-agent environments, achieving efficient learning and desirable behaviours is a challenge for Multi-Agent Reinforcement Learning systems.<n>This work explores the potential of combining MARL with Large Language Model (LLM)-mediated interventions to guide agents toward more desirable behaviours.
- Score: 3.5471755479440055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In complex multi-agent environments, achieving efficient learning and desirable behaviours is a significant challenge for Multi-Agent Reinforcement Learning (MARL) systems. This work explores the potential of combining MARL with Large Language Model (LLM)-mediated interventions to guide agents toward more desirable behaviours. Specifically, we investigate how LLMs can be used to interpret and facilitate interventions that shape the learning trajectories of multiple agents. We experimented with two types of interventions, referred to as controllers: a Natural Language (NL) Controller and a Rule-Based (RB) Controller. The NL Controller, which uses an LLM to simulate human-like interventions, showed a stronger impact than the RB Controller. Our findings indicate that agents particularly benefit from early interventions, leading to more efficient training and higher performance. Both intervention types outperform the baseline without interventions, highlighting the potential of LLM-mediated guidance to accelerate training and enhance MARL performance in challenging environments.
Related papers
- Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search [57.28671084993782]
Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains.<n>Recent studies have shown that increasing test-time computation enhances LLMs' reasoning capabilities.<n>We propose a two-stage training paradigm: 1) a small-scale format tuning stage to internalize the COAT reasoning format and 2) a large-scale self-improvement stage leveraging reinforcement learning.
arXiv Detail & Related papers (2025-02-04T17:26:58Z) - O-MAPL: Offline Multi-agent Preference Learning [5.4482836906033585]
Inferring reward functions from demonstrations is a key challenge in reinforcement learning (RL)
We introduce a novel end-to-end preference-based learning framework for cooperative MARL.
Our algorithm outperforms existing methods across various tasks.
arXiv Detail & Related papers (2025-01-31T08:08:20Z) - MALT: Improving Reasoning with Multi-Agent LLM Training [66.9481561915524]
MALT (Multi-Agent LLM Training) is a novel post-training strategy that divides the reasoning process into generation, verification, and refinement steps.<n>On MATH, GSM8K, and CSQA, MALT surpasses the same baseline LLM with a relative improvement of 15.66%, 7.42%, and 9.40% respectively.
arXiv Detail & Related papers (2024-12-02T19:30:36Z) - From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs [74.35290684163718]
A primary challenge in large language model (LLM) development is their onerous pre-training cost.
This paper explores a promising paradigm to improve LLM pre-training efficiency and quality by leveraging a small language model (SLM)
arXiv Detail & Related papers (2024-10-24T14:31:52Z) - CogSteer: Cognition-Inspired Selective Layer Intervention for Efficiently Steering Large Language Models [37.476241509187304]
Large Language Models (LLMs) achieve remarkable performance through pretraining on extensive data.
The lack of interpretability in their underlying mechanisms limits the ability to effectively steer LLMs for specific applications.
In this work, we investigate the mechanisms of LLMs from a cognitive perspective using eye movement measures.
arXiv Detail & Related papers (2024-10-23T09:40:15Z) - Controlling Large Language Model Agents with Entropic Activation Steering [20.56909601159833]
We introduce Entropic Activation Steering (EAST), an activation steering method for in-context learning agents.
We show that EAST can effectively manipulate an LLM agent's exploration by directly affecting the high-level actions parsed from the outputs of the LLM.
We also reveal how applying this control modulates the uncertainty exhibited in the LLM's thoughts, guiding the agent towards more exploratory actions.
arXiv Detail & Related papers (2024-06-01T00:25:00Z) - Controlling Large Language Model-based Agents for Large-Scale
Decision-Making: An Actor-Critic Approach [28.477463632107558]
We develop a modular framework called LLaMAC to address hallucination in Large Language Models and coordination in Multi-Agent Systems.
LLaMAC implements a value distribution encoding similar to that found in the human brain, utilizing internal and external feedback mechanisms to facilitate collaboration and iterative reasoning among its modules.
arXiv Detail & Related papers (2023-11-23T10:14:58Z) - Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents [16.24662355253529]
Large Language Models (LLMs) can address sequential decision-making tasks through the provision of high-level instructions.
LLMs lack specialization in tackling specific target problems, particularly in real-time dynamic environments.
We introduce a novel framework that addresses these challenges by training a smaller, specialized student RL agent using instructions from an LLM-based teacher agent.
arXiv Detail & Related papers (2023-11-22T13:15:42Z) - SALMON: Self-Alignment with Instructable Reward Models [80.83323636730341]
This paper presents a novel approach, namely SALMON, to align base language models with minimal human supervision.
We develop an AI assistant named Dromedary-2 with only 6 exemplars for in-context learning and 31 human-defined principles.
arXiv Detail & Related papers (2023-10-09T17:56:53Z) - LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving [84.31119464141631]
This work employs Large Language Models (LLMs) as a decision-making component for complex autonomous driving scenarios.
Extensive experiments demonstrate that our proposed method not only consistently surpasses baseline approaches in single-vehicle tasks, but also helps handle complex driving behaviors even multi-vehicle coordination.
arXiv Detail & Related papers (2023-10-04T17:59:49Z) - MA2CL:Masked Attentive Contrastive Learning for Multi-Agent
Reinforcement Learning [128.19212716007794]
We propose an effective framework called textbfMulti-textbfAgent textbfMasked textbfAttentive textbfContrastive textbfLearning (MA2CL)
MA2CL encourages learning representation to be both temporal and agent-level predictive by reconstructing the masked agent observation in latent space.
Our method significantly improves the performance and sample efficiency of different MARL algorithms and outperforms other methods in various vision-based and state-based scenarios.
arXiv Detail & Related papers (2023-06-03T05:32:19Z) - Relative Distributed Formation and Obstacle Avoidance with Multi-agent
Reinforcement Learning [20.401609420707867]
We propose a distributed formation and obstacle avoidance method based on multi-agent reinforcement learning (MARL)
Our method achieves better performance regarding formation error, formation convergence rate and on-par success rate of obstacle avoidance compared with baselines.
arXiv Detail & Related papers (2021-11-14T13:02:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.