Related papers: Collision- and Reachability-Aware Multi-Robot Control with Grounded LLM Planners

Collision- and Reachability-Aware Multi-Robot Control with Grounded LLM Planners

URL: http://arxiv.org/abs/2505.20573v2
Date: Tue, 03 Jun 2025 19:33:56 GMT
Title: Collision- and Reachability-Aware Multi-Robot Control with Grounded LLM Planners
Authors: Jiabao Ji, Yongchao Chen, Yang Zhang, Ramana Rao Kompella, Chuchu Fan, Gaowen Liu, Shiyu Chang,
Abstract summary: Large language models (LLMs) have demonstrated strong performance in various robot control tasks.<n>However, their deployment in real-world applications remains constrained.<n>We propose a novel framework that integrates reinforcement learning with verifiable rewards.
Score: 38.407073503042966
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have demonstrated strong performance in various robot control tasks. However, their deployment in real-world applications remains constrained. Even state-ofthe-art LLMs, such as GPT-o4mini, frequently produce invalid action plans that violate physical constraints, such as directing a robot to an unreachable location or causing collisions between robots. This issue primarily arises from a lack of awareness of these physical constraints during the reasoning process. To address this issue, we propose a novel framework that integrates reinforcement learning with verifiable rewards (RLVR) to incentivize knowledge of physical constraints into LLMs to induce constraints-aware reasoning during plan generation. In this approach, only valid action plans that successfully complete a control task receive positive rewards. We applied our method to two small-scale LLMs: a non-reasoning Qwen2.5-3B-Instruct and a reasoning Qwen3-4B. The experiment results demonstrate that constraint-aware small LLMs largely outperform large-scale models without constraints, grounded on both the BoxNet task and a newly developed BoxNet3D environment built using MuJoCo. This work highlights the effectiveness of grounding even small LLMs with physical constraints to enable scalable and efficient multi-robot control in complex, physically constrained environments.

Related papers

Collaborative LLM Inference via Planning for Efficient Reasoning [50.04696654679751]
We propose a test-time collaboration framework in which a planner model first generates a plan, defined as a distilled and high-level abstraction of the problem.<n>Small and large models take turns acting as planner and reasoner, exchanging plans in a multi-round cascade to collaboratively solve complex tasks.<n>Our method achieves accuracy comparable to strong proprietary models alone, while significantly reducing reliance on paid inference.
arXiv Detail & Related papers (2025-06-13T08:35:50Z)
SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation [20.743117921048537]
SmallPlan is a novel framework leveraging Large Language Models as teacher models to train lightweight Small Language Models (SLMs) for high-level path planning tasks.<n>SLMs are trained in a simulation-powered, interleaved manner with LLM-guided supervised fine-tuning and reinforcement learning.<n>SmallPlan is resource-efficient, making it well-suited for edge-device deployment and advancing practical autonomous robotics.
arXiv Detail & Related papers (2025-05-01T19:44:36Z)
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection [56.66677293607114]
We propose Code-as-Monitor (CaM) for both open-set reactive and proactive failure detection.<n>To enhance the accuracy and efficiency of monitoring, we introduce constraint elements that abstract constraint-related entities.<n>Experiments show that CaM achieves a 28.7% higher success rate and reduces execution time by 31.8% under severe disturbances.
arXiv Detail & Related papers (2024-12-05T18:58:27Z)
MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation [52.739500459903724]
Large Language Models (LLMs) have demonstrated remarkable planning abilities across various domains, including robotics manipulation and navigation. We propose a novel multi-agent LLM framework that distributes high-level planning and low-level control code generation across specialized LLM agents. We evaluate our approach on nine RLBench tasks, including long-horizon tasks, and demonstrate its ability to solve robotics manipulation in a zero-shot setting.
arXiv Detail & Related papers (2024-11-26T17:53:44Z)
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution [114.61347672265076]
Development of MLLMs for real-world robots is challenging due to the typically limited computation and memory capacities available on robotic platforms. We propose a Dynamic Early-Exit Framework for Robotic Vision-Language-Action Model (DeeR) that automatically adjusts the size of the activated MLLM. DeeR demonstrates significant reductions in computational costs of LLM by 5.2-6.5x and GPU memory of LLM by 2-6x without compromising performance.
arXiv Detail & Related papers (2024-11-04T18:26:08Z)
Trust the PRoC3S: Solving Long-Horizon Robotics Problems with LLMs and Constraint Satisfaction [38.683780057806516]
Recent developments in pretrained large language models (LLMs) applied robotics have demonstrated their capacity for sequencing a set of discrete skills to achieve open-ended goals in simple robotic tasks. In this paper, we examine the topic of LLM planning for a set of continuously parameterized skills whose execution must avoid violations of a set of kinematic, geometric, and physical constraints. Experiments across three different simulated 3D domains demonstrate that our proposed strategy, PRoC3S, is capable of solving a wide range of complex manipulation tasks with realistic constraints on continuous parameters much more efficiently and effectively than existing baselines.
arXiv Detail & Related papers (2024-06-08T20:56:14Z)
Empowering Large Language Models on Robotic Manipulation with Affordance Prompting [23.318449345424725]
Large language models fail to interact with the physical world by generating control sequences properly. Existing LLM-based approaches circumvent this problem by relying on additional pre-defined skills or pre-trained sub-policies. We propose a framework called LLM+A(ffordance) where the LLM serves as both the sub-task planner and the motion controller.
arXiv Detail & Related papers (2024-04-17T03:06:32Z)
Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents [99.17668730578586]
Pre-trained large language models (LLMs) capture procedural knowledge about the world. Plan, Eliminate, and Track (PET) framework translates a task description into a list of high-level sub-tasks. PET framework leads to a significant 15% improvement over SOTA for generalization to human goal specifications.
arXiv Detail & Related papers (2023-05-03T20:11:22Z)
Efficient Skill Acquisition for Complex Manipulation Tasks in Obstructed Environments [18.348489257164356]
We propose a system for efficient skill acquisition that leverages an object-centric generative model (OCGM) for versatile goal identification. OCGM enables one-shot target object identification and re-identification in new scenes, allowing MP to guide the robot to the target object while avoiding obstacles.
arXiv Detail & Related papers (2023-03-06T18:49:59Z)
Differentiable Constrained Imitation Learning for Robot Motion Planning and Control [0.26999000177990923]
We develop a framework for constraint robotic motion planning and control, as well as traffic agent simulation. We focus on mobile robot and automated driving applications. Simulated experiments of mobile robot navigation and automated driving provide evidence for the performance of the proposed method.
arXiv Detail & Related papers (2022-10-21T08:19:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.