Related papers: Guardian: Detecting Robotic Planning and Execution Errors with Vision-Language Models

Guardian: Detecting Robotic Planning and Execution Errors with Vision-Language Models

URL: http://arxiv.org/abs/2512.01946v2
Date: Tue, 02 Dec 2025 17:33:19 GMT
Title: Guardian: Detecting Robotic Planning and Execution Errors with Vision-Language Models
Authors: Paul Pacaud, Ricardo Garcia, Shizhe Chen, Cordelia Schmid,
Abstract summary: We propose an automatic robot failure synthesis approach that procedurally perturbs successful trajectories to generate diverse planning and execution failures.<n>We construct three new failure detection benchmarks: RLBench-Fail, BridgeDataV2-Fail, and UR5-Fail.<n>We then train Guardian, a VLM with multi-view images for detailed failure reasoning and detection.
Score: 53.20969621498248
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Robust robotic manipulation requires reliable failure detection and recovery. Although current Vision-Language Models (VLMs) show promise, their accuracy and generalization are limited by the scarcity of failure data. To address this data gap, we propose an automatic robot failure synthesis approach that procedurally perturbs successful trajectories to generate diverse planning and execution failures. This method produces not only binary classification labels but also fine-grained failure categories and step-by-step reasoning traces in both simulation and the real world. With it, we construct three new failure detection benchmarks: RLBench-Fail, BridgeDataV2-Fail, and UR5-Fail, substantially expanding the diversity and scale of existing failure datasets. We then train Guardian, a VLM with multi-view images for detailed failure reasoning and detection. Guardian achieves state-of-the-art performance on both existing and newly introduced benchmarks. It also effectively improves task success rates when integrated into a state-of-the-art manipulation system in simulation and real robots, demonstrating the impact of our generated failure data. Code, Data, and Models available at https://www.di.ens.fr/willow/research/guardian/.

Related papers

Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons [69.87766750714945]
General-purpose robot reward models are typically trained to predict absolute task progress from expert demonstrations.<n>We introduce Robometer, a scalable reward modeling framework that combines intra-trajectory progress supervision with inter-trajectory preference supervision.<n>Robometer is trained with a dual objective: a frame-level progress loss that anchors reward magnitude on expert data, and a trajectory-comparison preference loss that imposes global ordering constraints.
arXiv Detail & Related papers (2026-03-02T17:38:58Z)
Hierarchical Vision Language Action Model Using Success and Failure Demonstrations [60.82332413442677]
We introduce VINE, a hierarchical vision-language-action model that separates high-level reasoning from low-level control.<n>System 2 performs feasibility-guided tree search over a 2D scene-graph abstraction.<n>System 1 executes low-level actions without modifying the agent's core skills.
arXiv Detail & Related papers (2025-12-03T15:58:38Z)
Metacognitive Self-Correction for Multi-Agent System via Prototype-Guided Next-Execution Reconstruction [58.51530390018909]
Large Language Model based multi-agent systems excel at collaborative problem solving but remain brittle to cascading errors.<n>We present MASC, a metacognitive framework that endows MAS with real-time, unsupervised, step-level error detection and self-correction.
arXiv Detail & Related papers (2025-10-16T05:35:37Z)
Failure Prediction at Runtime for Generative Robot Policies [6.375597233389154]
Early failure prediction during runtime is essential for deploying robots in human-centered and safety-critical environments.<n>We propose FIPER, a framework for failure prediction for generative robot policies that does not require failure data.<n>Our results demonstrate that FIPER better distinguishes actual failures from benign OOD situations and predicts failures more accurately and earlier than existing methods.
arXiv Detail & Related papers (2025-10-10T15:09:27Z)
Action Flow Matching for Continual Robot Learning [54.10050120844738]
Continual learning in robotics seeks systems that can constantly adapt to changing environments and tasks.<n>We introduce a generative framework leveraging flow matching for online robot dynamics model alignment.<n>We find that by transforming the actions themselves rather than exploring with a misaligned model, the robot collects informative data more efficiently.
arXiv Detail & Related papers (2025-04-25T16:26:15Z)
A Unified Framework for Real-Time Failure Handling in Robotics Using Vision-Language Models, Reactive Planner and Behavior Trees [1.3481665321936716]
This paper presents a unified failure recovery framework that combines Vision-Language Models (VLMs), a reactive planner, and Behavior Trees (BTs) to enable real-time failure handling.<n>Our approach includes pre-execution verification, which checks for potential failures before execution, and reactive failure handling, which detects and corrects failures during execution.<n>We evaluate our framework through real-world experiments with an ABB YuMi robot on tasks like peg insertion, object sorting, and drawer placement.
arXiv Detail & Related papers (2025-03-19T13:40:56Z)
HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation [54.03004125910057]
We show that hierarchical vision-language-action models can be more effective in utilizing off-domain data than standard monolithic VLA models.<n>We show that, with the hierarchical design, the high-level VLM can transfer across significant domain gaps between the off-domain finetuning data and real-robot testing scenarios.
arXiv Detail & Related papers (2025-02-08T07:50:22Z)
Learning to Recover from Plan Execution Errors during Robot Manipulation: A Neuro-symbolic Approach [7.768747914019512]
We propose an approach (blending learning with symbolic search) for automated error discovery and recovery. We present an anytime version of our algorithm, where instead of recovering to the last correct state, we search for a sub-goal in the original plan.
arXiv Detail & Related papers (2024-05-29T10:03:57Z)
REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction [28.015693808520496]
REFLECT is a framework which queries Large Language Models for failure reasoning based on a hierarchical summary of robot past experiences. We show that REFLECT is able to generate informative failure explanations that assist successful correction planning.
arXiv Detail & Related papers (2023-06-27T18:03:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.