GenSwarm: Scalable Multi-Robot Code-Policy Generation and Deployment via Language Models
- URL: http://arxiv.org/abs/2503.23875v1
- Date: Mon, 31 Mar 2025 09:26:34 GMT
- Title: GenSwarm: Scalable Multi-Robot Code-Policy Generation and Deployment via Language Models
- Authors: Wenkang Ji, Huaben Chen, Mingyang Chen, Guobin Zhu, Lufeng Xu, Roderich Groß, Rui Zhou, Ming Cao, Shiyu Zhao,
- Abstract summary: GenSwarm is an end-to-end system that generates and deploys control policies for multi-robot tasks based on simple user instructions in natural language.<n>As a multi-language-agent system, GenSwarm achieves zero-shot learning, enabling rapid adaptation to altered or unseen tasks.
- Score: 17.522946748641324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The development of control policies for multi-robot systems traditionally follows a complex and labor-intensive process, often lacking the flexibility to adapt to dynamic tasks. This has motivated research on methods to automatically create control policies. However, these methods require iterative processes of manually crafting and refining objective functions, thereby prolonging the development cycle. This work introduces \textit{GenSwarm}, an end-to-end system that leverages large language models to automatically generate and deploy control policies for multi-robot tasks based on simple user instructions in natural language. As a multi-language-agent system, GenSwarm achieves zero-shot learning, enabling rapid adaptation to altered or unseen tasks. The white-box nature of the code policies ensures strong reproducibility and interpretability. With its scalable software and hardware architectures, GenSwarm supports efficient policy deployment on both simulated and real-world multi-robot systems, realizing an instruction-to-execution end-to-end functionality that could prove valuable for robotics specialists and non-specialists alike.The code of the proposed GenSwarm system is available online: https://github.com/WindyLab/GenSwarm.
Related papers
- Vision-Language-Policy Model for Dynamic Robot Task Planning [8.427578025752219]
Gap between natural language commands and autonomous execution remains an open challenge for robotics.<n>Traditional robotic task-planning approaches often struggle to bridge low-level execution with high-level task reasoning.<n>We propose a novel language model-based framework for dynamic robot task planning.
arXiv Detail & Related papers (2025-12-22T09:12:48Z) - Robot guide with multi-agent control and automatic scenario generation with LLM [0.0]
The work describes the development of a hybrid control architecture for an anthropomorphic tour guide robot.<n>The proposed approach aims to overcome the limitations of traditional systems, which rely on manual tuning of behavior scenarios.
arXiv Detail & Related papers (2025-09-12T14:59:04Z) - Interpretable Robot Control via Structured Behavior Trees and Large Language Models [0.14990005092937678]
This paper presents a novel framework that bridges natural language understanding and robotic execution.<n>The proposed approach is practical in real-world scenarios, with an average cognition-to-execution accuracy of approximately 94%.
arXiv Detail & Related papers (2025-08-13T08:53:13Z) - RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation [90.81956345363355]
RoBridge is a hierarchical intelligent architecture for general robotic manipulation.<n>It consists of a high-level cognitive planner (HCP) based on a large-scale pre-trained vision-language model (VLM)<n>It unleashes the procedural skill of reinforcement learning, effectively bridging the gap between cognition and execution.
arXiv Detail & Related papers (2025-05-03T06:17:18Z) - Trajectory Adaptation using Large Language Models [0.8704964543257245]
Adapting robot trajectories based on human instructions as per new situations is essential for achieving more intuitive and scalable human-robot interactions.
This work proposes a flexible language-based framework to adapt generic robotic trajectories produced by off-the-shelf motion planners.
We utilize pre-trained LLMs to adapt trajectory waypoints by generating code as a policy for dense robot manipulation.
arXiv Detail & Related papers (2025-04-17T08:48:23Z) - Grounding Robot Policies with Visuomotor Language Guidance [15.774237279917594]
We propose an agent-based framework for grounding robot policies to the current context.
The proposed framework is composed of a set of conversational agents designed for specific roles.
We demonstrate that our approach can effectively guide manipulation policies to achieve significantly higher success rates.
arXiv Detail & Related papers (2024-10-09T02:00:37Z) - ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning [74.58666091522198]
We present a framework for intuitive robot programming by non-experts.
We leverage natural language prompts and contextual information from the Robot Operating System (ROS)
Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface.
arXiv Detail & Related papers (2024-06-28T08:28:38Z) - Towards Natural Language-Driven Assembly Using Foundation Models [11.710022685486914]
Large Language Models (LLMs) and strong vision models have enabled rapid research and development in the field of Vision-Language-Action models.
We present a global control policy based on LLMs that can transfer the control policy to a finite set of skills that are specifically trained to perform high-precision tasks.
The integration of LLMs into this framework underscores their significance in not only interpreting and processing language inputs but also in enriching the control mechanisms for diverse and intricate robotic operations.
arXiv Detail & Related papers (2024-06-23T12:14:37Z) - Policy Learning with a Language Bottleneck [65.99843627646018]
We introduce Policy Learning with a Language Bottleneck (PLLB), a framework enabling AI agents to generate linguistic rules.
PLLBB alternates between a *rule generation* step guided by language models, and an *update* step where agents learn new policies guided by rules.
We show thatPLLB agents are able to learn more interpretable and generalizable behaviors, but can also share the learned rules with human users.
arXiv Detail & Related papers (2024-05-07T08:40:21Z) - RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis [102.1876259853457]
We propose a tree-structured multimodal code generation framework for generalized robotic behavior synthesis, termed RoboCodeX.
RoboCodeX decomposes high-level human instructions into multiple object-centric manipulation units consisting of physical preferences such as affordance and safety constraints.
To further enhance the capability to map conceptual and perceptual understanding into control commands, a specialized multimodal reasoning dataset is collected for pre-training and an iterative self-updating methodology is introduced for supervised fine-tuning.
arXiv Detail & Related papers (2024-02-25T15:31:43Z) - RoboScript: Code Generation for Free-Form Manipulation Tasks across Real
and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation.
We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language.
We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z) - RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation [68.70755196744533]
RoboGen is a generative robotic agent that automatically learns diverse robotic skills at scale via generative simulation.
Our work attempts to extract the extensive and versatile knowledge embedded in large-scale models and transfer them to the field of robotics.
arXiv Detail & Related papers (2023-11-02T17:59:21Z) - Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions
with Large Language Model [63.66204449776262]
Instruct2Act is a framework that maps multi-modal instructions to sequential actions for robotic manipulation tasks.
Our approach is adjustable and flexible in accommodating various instruction modalities and input types.
Our zero-shot method outperformed many state-of-the-art learning-based policies in several tasks.
arXiv Detail & Related papers (2023-05-18T17:59:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.