Related papers: ICCO: Learning an Instruction-conditioned Coordinator for Language-guided Task-aligned Multi-robot Control

ICCO: Learning an Instruction-conditioned Coordinator for Language-guided Task-aligned Multi-robot Control

URL: http://arxiv.org/abs/2503.12122v2
Date: Wed, 23 Jul 2025 04:56:04 GMT
Title: ICCO: Learning an Instruction-conditioned Coordinator for Language-guided Task-aligned Multi-robot Control
Authors: Yoshiki Yano, Kazuki Shibata, Maarten Kokshoorn, Takamitsu Matsubara,
Abstract summary: We propose Instruction-Conditioned Coordinator (ICCO) to enhance coordination in language-guided multi-robot systems.<n>ICCO consists of a Coordinator agent and multiple Local Agents, where the Coordinator generates Task-Aligned and Consistent Instructions.<n>A Consistency Enhancement Term is added to the learning objective to maximize mutual information between instructions and robot behaviors.
Score: 7.335799770583488
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in Large Language Models (LLMs) have permitted the development of language-guided multi-robot systems, which allow robots to execute tasks based on natural language instructions. However, achieving effective coordination in distributed multi-agent environments remains challenging due to (1) misalignment between instructions and task requirements and (2) inconsistency in robot behaviors when they independently interpret ambiguous instructions. To address these challenges, we propose Instruction-Conditioned Coordinator (ICCO), a Multi-Agent Reinforcement Learning (MARL) framework designed to enhance coordination in language-guided multi-robot systems. ICCO consists of a Coordinator agent and multiple Local Agents, where the Coordinator generates Task-Aligned and Consistent Instructions (TACI) by integrating language instructions with environmental states, ensuring task alignment and behavioral consistency. The Coordinator and Local Agents are jointly trained to optimize a reward function that balances task efficiency and instruction following. A Consistency Enhancement Term is added to the learning objective to maximize mutual information between instructions and robot behaviors, further improving coordination. Simulation and real-world experiments validate the effectiveness of ICCO in achieving language-guided task-aligned multi-robot control. The demonstration can be found at https://yanoyoshiki.github.io/ICCO/.

Related papers

Advances and Innovations in the Multi-Agent Robotic System (MARS) Challenge [170.47383225329915]
Multi-agent system frameworks are becoming essential for achieving scalable, efficient, and collaborative solutions.<n>This shift is fueled by three primary factors: increasing agent capabilities, enhancing system efficiency through task delegation, and enabling advanced human-agent interactions.<n>We propose the Multi-Agent Robotic System (MARS) Challenge, held at the NeurIPS 2025 Workshop on SpaVLE.
arXiv Detail & Related papers (2026-01-26T17:56:19Z)
TACOS: Task Agnostic COordinator of a multi-drone System [41.99844472131922]
TACOS (Task-Agnostic COordinator of a multi-drone System) is a unified framework that enables high-level natural language control of multi-UAV systems.<n>It integrates three key capabilities into a single architecture: a one-to-many natural language interface for intuitive user interaction, an intelligent coordinator for translating user intent into structured task plans, and an autonomous agent that executes plans interacting with the real-world.
arXiv Detail & Related papers (2025-10-02T10:21:35Z)
Learning to Interact in World Latent for Team Coordination [53.51290193631586]
This work presents a novel representation learning framework, interactive world latent (IWoL), to facilitate team coordination in multi-agent reinforcement learning (MARL)<n>Our key insight is to construct a learnable representation space that jointly captures inter-agent relations and task-specific world information by directly modeling communication protocols.<n>Our representation can be used not only as an implicit latent for each agent, but also as an explicit message for communication.
arXiv Detail & Related papers (2025-09-29T22:13:39Z)
Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration [63.90193684394165]
We introduce multi-agent cross-task experiential learning (MAEL), a novel framework that endows LLM-driven agents with explicit cross-task learning and experience accumulation.<n>During the experiential learning phase, we quantify the quality for each step in the task-solving workflow and store the resulting rewards.<n>During inference, agents retrieve high-reward, task-relevant experiences as few-shot examples to enhance the effectiveness of each reasoning step.
arXiv Detail & Related papers (2025-05-29T07:24:37Z)
AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence [54.317522790545304]
We present AgentOrca, a dual-system framework for evaluating language agents' compliance with operational constraints and routines. Our framework encodes action constraints and routines through both natural language prompts for agents and corresponding executable code serving as ground truth for automated verification. Our findings reveal notable performance gaps among state-of-the-art models, with large reasoning models like o1 demonstrating superior compliance while others show significantly lower performance.
arXiv Detail & Related papers (2025-03-11T17:53:02Z)
CaPo: Cooperative Plan Optimization for Efficient Embodied Multi-Agent Cooperation [98.11670473661587]
CaPo improves cooperation efficiency with two phases: 1) meta-plan generation, and 2) progress-adaptive meta-plan and execution.<n> Experimental results on the ThreeDworld Multi-Agent Transport and Communicative Watch-And-Help tasks demonstrate that CaPo achieves much higher task completion rate and efficiency compared with state-of-the-arts.
arXiv Detail & Related papers (2024-11-07T13:08:04Z)
CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language Models [19.73329768987112]
CurricuLLM is a curriculum learning tool for complex robot control tasks. It generates subtasks that aid target task learning in natural language form. It also translates natural language description of subtasks into executable code. CurricuLLM can aid learning complex robot control tasks.
arXiv Detail & Related papers (2024-09-27T01:48:16Z)
COHERENT: Collaboration of Heterogeneous Multi-Robot System with Large Language Models [49.24666980374751]
COHERENT is a novel LLM-based task planning framework for collaboration of heterogeneous multi-robot systems. A Proposal-Execution-Feedback-Adjustment mechanism is designed to decompose and assign actions for individual robots. The experimental results show that our work surpasses the previous methods by a large margin in terms of success rate and execution efficiency.
arXiv Detail & Related papers (2024-09-23T15:53:41Z)
Towards Collaborative Intelligence: Propagating Intentions and Reasoning for Multi-Agent Coordination with Large Language Models [41.95288786980204]
Current agent frameworks often suffer from dependencies on single-agent execution and lack robust inter- module communication. We present a framework for training large language models as collaborative agents to enable coordinated behaviors in cooperative MARL. A propagation network transforms broadcast intentions into teammate-specific communication messages, sharing relevant goals with designated teammates.
arXiv Detail & Related papers (2024-07-17T13:14:00Z)
Instruction Following with Goal-Conditioned Reinforcement Learning in Virtual Environments [42.06453257292203]
We propose a hierarchical framework that combines the deep language comprehension of large language models with the adaptive action-execution capabilities of reinforcement learning agents. We have demonstrated the effectiveness of our approach in two different environments: in IGLU, where agents are instructed to build structures, and in Crafter, where agents perform tasks and interact with objects in the surrounding environment according to language commands.
arXiv Detail & Related papers (2024-07-12T14:19:36Z)
Large Language Models for Orchestrating Bimanual Robots [19.60907949776435]
We present LAnguage-model-based Bimanual ORchestration (LABOR) to analyze task configurations and devise coordination control policies. We evaluate our method through simulated experiments involving two classes of long-horizon tasks using the NICOL humanoid robot.
arXiv Detail & Related papers (2024-04-02T15:08:35Z)
Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World [13.005764902339523]
We design a blocks-world environment where two agents, each having unique goals and skills, build a target structure together. To complete the goals, they can act in the world and communicate in natural language. We adopt chain-of-thought prompts that include intermediate reasoning steps to model the partner's state and identify and correct execution errors.
arXiv Detail & Related papers (2024-03-30T04:48:38Z)
Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse Planning [52.91457780361305]
This paper introduces cooperative language-guided inverse plan search (CLIPS) Our agent assists a human by modeling them as a cooperative planner who communicates joint plans to the assistant. We evaluate these capabilities in two cooperative planning domains (Doors, Keys & Gems and VirtualHome)
arXiv Detail & Related papers (2024-02-27T23:06:53Z)
Unified Human-Scene Interaction via Prompted Chain-of-Contacts [61.87652569413429]
Human-Scene Interaction (HSI) is a vital component of fields like embodied AI and virtual reality. This paper presents a unified HSI framework, UniHSI, which supports unified control of diverse interactions through language commands.
arXiv Detail & Related papers (2023-09-14T17:59:49Z)
Building Cooperative Embodied Agents Modularly with Large Language Models [104.57849816689559]
We address challenging multi-agent cooperation problems with decentralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embodied environments. We harness the commonsense knowledge, reasoning ability, language comprehension, and text generation prowess of LLMs and seamlessly incorporate them into a cognitive-inspired modular framework. Our experiments on C-WAH and TDW-MAT demonstrate that CoELA driven by GPT-4 can surpass strong planning-based methods and exhibit emergent effective communication.
arXiv Detail & Related papers (2023-07-05T17:59:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.