Related papers: RoboInspector: Unveiling the Unreliability of Policy Code for LLM-enabled Robotic Manipulation

RoboInspector: Unveiling the Unreliability of Policy Code for LLM-enabled Robotic Manipulation

URL: http://arxiv.org/abs/2508.21378v1
Date: Fri, 29 Aug 2025 07:47:17 GMT
Title: RoboInspector: Unveiling the Unreliability of Policy Code for LLM-enabled Robotic Manipulation
Authors: Chenduo Ying, Linkang Du, Peng Cheng, Yuanchao Shu,
Abstract summary: Large language models (LLMs) demonstrate remarkable capabilities in reasoning and code generation.<n>Despite advances, achieving reliable policy code generation remains a significant challenge due to the diverse requirements.<n>We introduce RoboInspector, a pipeline to unveil and characterize the unreliability of the policy code for LLM-enabled robotic manipulation.
Score: 7.650053106303868
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) demonstrate remarkable capabilities in reasoning and code generation, enabling robotic manipulation to be initiated with just a single instruction. The LLM carries out various tasks by generating policy code required to control the robot. Despite advances in LLMs, achieving reliable policy code generation remains a significant challenge due to the diverse requirements of real-world tasks and the inherent complexity of user instructions. In practice, different users may provide distinct instructions to drive the robot for the same task, which may cause the unreliability of policy code generation. To bridge this gap, we design RoboInspector, a pipeline to unveil and characterize the unreliability of the policy code for LLM-enabled robotic manipulation from two perspectives: the complexity of the manipulation task and the granularity of the instruction. We perform comprehensive experiments with 168 distinct combinations of tasks, instructions, and LLMs in two prominent frameworks. The RoboInspector identifies four main unreliable behaviors that lead to manipulation failure. We provide a detailed characterization of these behaviors and their underlying causes, giving insight for practical development to reduce unreliability. Furthermore, we introduce a refinement approach guided by failure policy code feedback that improves the reliability of policy code generation by up to 35% in LLM-enabled robotic manipulation, evaluated in both simulation and real-world environments.

Related papers

Act-Observe-Rewrite: Multimodal Coding Agents as In-Context Policy Learners for Robot Manipulation [0.0]
We present Act-Observe-Rewrite (AOR), a framework in which an LLM agent improves a robot manipulation policy.<n>AOR makes the full low-level motor control implementation the unit of LLM reasoning.<n>We report promising results, with the agent achieving high success rates without demonstrations, reward engineering, or gradient updates.
arXiv Detail & Related papers (2026-03-03T22:15:55Z)
ALRM: Agentic LLM for Robotic Manipulation [3.7473235317736058]
Large Language Models (LLMs) have recently empowered agentic frameworks to exhibit advanced reasoning and planning capabilities.<n>Large Language Models (LLMs) have recently empowered agentic frameworks to exhibit advanced reasoning and planning capabilities.
arXiv Detail & Related papers (2026-01-27T11:54:14Z)
From Code to Action: Hierarchical Learning of Diffusion-VLM Policies [8.0703783175731]
Imitation learning for robotic manipulation often suffers from limited generalization and data scarcity.<n>In this work, we introduce a hierarchical framework that leverages code-generating vision-language models (VLMs)<n>We find that this design enables interpretable policy decomposition, improves generalization when compared to flat policies and enables separate evaluation of high-level planning and low-level control.
arXiv Detail & Related papers (2025-09-29T15:22:18Z)
An Effective Approach to Embedding Source Code by Combining Large Language and Sentence Embedding Models [6.976968804436321]
This paper proposes a novel approach to embedding source code by combining large language and sentence embedding models.<n>To evaluate the performance of our proposed approach, we conducted a series of experiments on three datasets with different programming languages.
arXiv Detail & Related papers (2024-09-23T01:03:15Z)
Compromising Embodied Agents with Contextual Backdoor Attacks [69.71630408822767]
Large language models (LLMs) have transformed the development of embodied intelligence. This paper uncovers a significant backdoor security threat within this process. By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a black-box LLM.
arXiv Detail & Related papers (2024-08-06T01:20:12Z)
ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning [74.58666091522198]
We present a framework for intuitive robot programming by non-experts. We leverage natural language prompts and contextual information from the Robot Operating System (ROS) Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface.
arXiv Detail & Related papers (2024-06-28T08:28:38Z)
Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning Code LLMs [42.31298987176411]
We introduce ROBO-INSTRUCT, which synthesizes task-specific simulation environments on the fly during program execution.<n>ROBO-INSTRUCT integrates an LLM-aided post-processing procedure to refine instructions for better alignment with robot programs.
arXiv Detail & Related papers (2024-05-30T15:47:54Z)
Large Language Models for Orchestrating Bimanual Robots [19.60907949776435]
We present LAnguage-model-based Bimanual ORchestration (LABOR) to analyze task configurations and devise coordination control policies. We evaluate our method through simulated experiments involving two classes of long-horizon tasks using the NICOL humanoid robot.
arXiv Detail & Related papers (2024-04-02T15:08:35Z)
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis [102.1876259853457]
We propose a tree-structured multimodal code generation framework for generalized robotic behavior synthesis, termed RoboCodeX. RoboCodeX decomposes high-level human instructions into multiple object-centric manipulation units consisting of physical preferences such as affordance and safety constraints. To further enhance the capability to map conceptual and perceptual understanding into control commands, a specialized multimodal reasoning dataset is collected for pre-training and an iterative self-updating methodology is introduced for supervised fine-tuning.
arXiv Detail & Related papers (2024-02-25T15:31:43Z)
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation. We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language. We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z)
On the Vulnerability of LLM/VLM-Controlled Robotics [54.57914943017522]
We highlight vulnerabilities in robotic systems integrating large language models (LLMs) and vision-language models (VLMs) due to input modality sensitivities.<n>Our results show that simple input perturbations reduce task execution success rates by 22.2% and 14.6% in two representative LLM/VLM-controlled robotic systems.
arXiv Detail & Related papers (2024-02-15T22:01:45Z)
Language to Rewards for Robotic Skill Synthesis [37.21434094015743]
We introduce a new paradigm that harnesses large language models (LLMs) to define reward parameters that can be optimized and accomplish variety of robotic tasks. Using reward as the intermediate interface generated by LLMs, we can effectively bridge the gap between high-level language instructions or corrections to low-level robot actions.
arXiv Detail & Related papers (2023-06-14T17:27:10Z)
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model [63.66204449776262]
Instruct2Act is a framework that maps multi-modal instructions to sequential actions for robotic manipulation tasks. Our approach is adjustable and flexible in accommodating various instruction modalities and input types. Our zero-shot method outperformed many state-of-the-art learning-based policies in several tasks.
arXiv Detail & Related papers (2023-05-18T17:59:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.