Harnessing Language for Coordination: A Framework and Benchmark for LLM-Driven Multi-Agent Control
- URL: http://arxiv.org/abs/2412.11761v2
- Date: Tue, 22 Apr 2025 11:24:23 GMT
- Title: Harnessing Language for Coordination: A Framework and Benchmark for LLM-Driven Multi-Agent Control
- Authors: Timothée Anne, Noah Syrkis, Meriem Elhosni, Florian Turati, Franck Legendre, Alain Jaquier, Sebastian Risi,
- Abstract summary: Large Language Models (LLMs) have demonstrated remarkable performance across various tasks.<n>Their potential to facilitate human coordination with many agents is a promising but largely under-explored area.<n>In this work, we introduce (1) a real-time strategy game benchmark designed to evaluate these abilities and (2) a novel framework we term HIVE.
- Score: 6.721923873906492
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across various tasks. Their potential to facilitate human coordination with many agents is a promising but largely under-explored area. Such capabilities would be helpful in disaster response, urban planning, and real-time strategy scenarios. In this work, we introduce (1) a real-time strategy game benchmark designed to evaluate these abilities and (2) a novel framework we term HIVE. HIVE empowers a single human to coordinate swarms of up to 2,000 agents through a natural language dialog with an LLM. We present promising results on this multi-agent benchmark, with our hybrid approach solving tasks such as coordinating agent movements, exploiting unit weaknesses, leveraging human annotations, and understanding terrain and strategic points. Our findings also highlight critical limitations of current models, including difficulties in processing spatial visual information and challenges in formulating long-term strategic plans. This work sheds light on the potential and limitations of LLMs in human-swarm coordination, paving the way for future research in this area. The HIVE project page, hive.syrkis.com, includes videos of the system in action.
Related papers
- Can LLM Agents Solve Collaborative Tasks? A Study on Urgency-Aware Planning and Coordination [4.511923587827302]
Large Language Models (LLMs) have shown strong capabilities in communication, planning, and reasoning.<n>This study offers new insights into the strengths and failure modes of LLMs in physically grounded multi-agent collaboration tasks.
arXiv Detail & Related papers (2025-08-20T11:44:10Z) - Society of Mind Meets Real-Time Strategy: A Hierarchical Multi-Agent Framework for Strategic Reasoning [16.35236123729838]
We propose a hierarchical multi-agent framework that employs specialized imitation learning agents under a meta-controller called Strategic Planner (SP)<n>By expert demonstrations, each specialized agent learns a distinctive strategy, such as aerial support or defensive maneuvers, and produces coherent, structured multistep action sequences.<n>The SP then orchestrates these proposals into a single, environmentally adaptive plan that ensures local decisions align with long-term strategies.
arXiv Detail & Related papers (2025-08-08T05:57:12Z) - Benchmarking LLMs' Swarm intelligence [50.544186914115045]
Large Language Models (LLMs) show potential for complex reasoning, yet their capacity for emergent coordination in Multi-Agent Systems (MAS) remains largely unexplored.<n>We introduce SwarmBench, a novel benchmark designed to systematically evaluate tasks of LLMs acting as decentralized agents.<n>We propose metrics for coordination effectiveness and analyze emergent group dynamics.
arXiv Detail & Related papers (2025-05-07T12:32:01Z) - MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents [59.825725526176655]
Large Language Models (LLMs) have shown remarkable capabilities as autonomous agents.
Existing benchmarks either focus on single-agent tasks or are confined to narrow domains, failing to capture the dynamics of multi-agent coordination and competition.
We introduce MultiAgentBench, a benchmark designed to evaluate LLM-based multi-agent systems across diverse, interactive scenarios.
arXiv Detail & Related papers (2025-03-03T05:18:50Z) - Multi-Agent Autonomous Driving Systems with Large Language Models: A Survey of Recent Advances [61.539442227802226]
Large Language Models (LLMs) have been integrated into Autonomous Driving Systems (ADSs) to support high-level decision-making.<n>LLMs face three major challenges: limited perception, insufficient collaboration, and high computational demands.<n>Recent advances in multi-agent ADSs leverage language-driven communication and coordination to enhance inter-agent collaboration.
arXiv Detail & Related papers (2025-02-24T03:26:13Z) - Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.
However, they still struggle with problems requiring multi-step decision-making and environmental feedback.
We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation [52.739500459903724]
Large Language Models (LLMs) have demonstrated remarkable planning abilities across various domains, including robotics manipulation and navigation.
We propose a novel multi-agent LLM framework that distributes high-level planning and low-level control code generation across specialized LLM agents.
We evaluate our approach on nine RLBench tasks, including long-horizon tasks, and demonstrate its ability to solve robotics manipulation in a zero-shot setting.
arXiv Detail & Related papers (2024-11-26T17:53:44Z) - BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games [44.16513620589459]
We introduce BALROG, a novel benchmark to assess the agentic capabilities of Large Language Models (LLMs) and Vision Language Models (VLMs)
Our benchmark incorporates a range of existing reinforcement learning environments with varying levels of difficulty, including tasks that are solvable by non-expert humans in seconds to extremely challenging ones that may take years to master.
Our findings indicate that while current models achieve partial success in the easier games, they struggle significantly with more challenging tasks.
arXiv Detail & Related papers (2024-11-20T18:54:32Z) - Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents [101.17919953243107]
GovSim is a generative simulation platform designed to study strategic interactions and cooperative decision-making in large language models (LLMs)<n>We find that all but the most powerful LLM agents fail to achieve a sustainable equilibrium in GovSim, with the highest survival rate below 54%.<n>We show that agents that leverage "Universalization"-based reasoning, a theory of moral thinking, are able to achieve significantly better sustainability.
arXiv Detail & Related papers (2024-04-25T15:59:16Z) - Large Language Models for Orchestrating Bimanual Robots [19.60907949776435]
We present LAnguage-model-based Bimanual ORchestration (LABOR) to analyze task configurations and devise coordination control policies.
We evaluate our method through simulated experiments involving two classes of long-horizon tasks using the NICOL humanoid robot.
arXiv Detail & Related papers (2024-04-02T15:08:35Z) - LLM-Powered Hierarchical Language Agent for Real-time Human-AI
Coordination [28.22553394518179]
We propose a Hierarchical Language Agent (HLA) for human-AI coordination.
HLA provides both strong reasoning abilities while keeping real-time execution.
Human studies show that HLA outperforms other baseline agents, including slow-mind-only agents and fast-mind-only agents.
arXiv Detail & Related papers (2023-12-23T11:09:48Z) - MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration [98.18244218156492]
Large Language Models (LLMs) have significantly advanced natural language processing.<n>As their applications expand into multi-agent environments, there arises a need for a comprehensive evaluation framework.<n>This work introduces a novel competition-based benchmark framework to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z) - MetaAgents: Simulating Interactions of Human Behaviors for LLM-based
Task-oriented Coordination via Collaborative Generative Agents [27.911816995891726]
We introduce collaborative generative agents, endowing LLM-based Agents with consistent behavior patterns and task-solving abilities.
We propose a novel framework that equips collaborative generative agents with human-like reasoning abilities and specialized skills.
Our work provides valuable insights into the role and evolution of Large Language Models in task-oriented social simulations.
arXiv Detail & Related papers (2023-10-10T10:17:58Z) - LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models [23.092480882456048]
This study aims at a detailed analysis of Large Language Models (LLMs) within the context of Pure Coordination Games.
Our findings indicate that LLM agents equipped with GPT-4-turbo achieve comparable performance to state-of-the-art reinforcement learning methods.
Results on Coordination QA show a large room for improvement in the Theory of Mind reasoning and joint planning abilities of LLMs.
arXiv Detail & Related papers (2023-10-05T21:18:15Z) - Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation [52.930183136111864]
We propose using scorable negotiation to evaluate Large Language Models (LLMs)
To reach an agreement, agents must have strong arithmetic, inference, exploration, and planning capabilities.
We provide procedures to create new games and increase games' difficulty to have an evolving benchmark.
arXiv Detail & Related papers (2023-09-29T13:33:06Z) - Building Cooperative Embodied Agents Modularly with Large Language
Models [104.57849816689559]
We address challenging multi-agent cooperation problems with decentralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embodied environments.
We harness the commonsense knowledge, reasoning ability, language comprehension, and text generation prowess of LLMs and seamlessly incorporate them into a cognitive-inspired modular framework.
Our experiments on C-WAH and TDW-MAT demonstrate that CoELA driven by GPT-4 can surpass strong planning-based methods and exhibit emergent effective communication.
arXiv Detail & Related papers (2023-07-05T17:59:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.