Related papers: Unpacking Failure Modes of Generative Policies: Runtime Monitoring of Consistency and Progress

Unpacking Failure Modes of Generative Policies: Runtime Monitoring of Consistency and Progress

URL: http://arxiv.org/abs/2410.04640v2
Date: Thu, 10 Oct 2024 17:09:24 GMT
Title: Unpacking Failure Modes of Generative Policies: Runtime Monitoring of Consistency and Progress
Authors: Christopher Agia, Rohan Sinha, Jingyun Yang, Zi-ang Cao, Rika Antonova, Marco Pavone, Jeannette Bohg,
Abstract summary: We propose a runtime monitoring framework that splits the detection of failures into two complementary categories. We use Vision Language Models (VLMs) to detect when the policy confidently and consistently takes actions that do not solve the task. By unifying temporal consistency detection and VLM runtime monitoring, Sentinel detects 18% more failures than using either of the two detectors alone.
Score: 31.952925824381325
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Robot behavior policies trained via imitation learning are prone to failure under conditions that deviate from their training data. Thus, algorithms that monitor learned policies at test time and provide early warnings of failure are necessary to facilitate scalable deployment. We propose Sentinel, a runtime monitoring framework that splits the detection of failures into two complementary categories: 1) Erratic failures, which we detect using statistical measures of temporal action consistency, and 2) task progression failures, where we use Vision Language Models (VLMs) to detect when the policy confidently and consistently takes actions that do not solve the task. Our approach has two key strengths. First, because learned policies exhibit diverse failure modes, combining complementary detectors leads to significantly higher accuracy at failure detection. Second, using a statistical temporal action consistency measure ensures that we quickly detect when multimodal, generative policies exhibit erratic behavior at negligible computational cost. In contrast, we only use VLMs to detect failure modes that are less time-sensitive. We demonstrate our approach in the context of diffusion policies trained on robotic mobile manipulation domains in both simulation and the real world. By unifying temporal consistency detection and VLM runtime monitoring, Sentinel detects 18% more failures than using either of the two detectors alone and significantly outperforms baselines, thus highlighting the importance of assigning specialized detectors to complementary categories of failure. Qualitative results are made available at https://sites.google.com/stanford.edu/sentinel.

Related papers

Anomalous Decision Discovery using Inverse Reinforcement Learning [3.3675535571071746]
Anomaly detection plays a critical role in Autonomous Vehicles (AVs) by identifying unusual behaviors through perception systems.<n>Current approaches, which often rely on predefined thresholds or supervised learning paradigms, exhibit reduced efficacy when confronted with unseen scenarios.<n>We present Trajectory-Reward Guided Adaptive Pre-training (TRAP), a novel IRL framework for anomaly detection.
arXiv Detail & Related papers (2025-07-06T17:01:02Z)
Lie Detector: Unified Backdoor Detection via Cross-Examination Framework [68.45399098884364]
We propose a unified backdoor detection framework in the semi-honest setting. Our method achieves superior detection performance, improving accuracy by 5.4%, 1.6%, and 11.9% over SoTA baselines. Notably, it is the first to effectively detect backdoors in multimodal large language models.
arXiv Detail & Related papers (2025-03-21T06:12:06Z)
Can We Detect Failures Without Failure Data? Uncertainty-Aware Runtime Failure Detection for Imitation Learning Policies [19.27526590452503]
FAIL-Detect is a two-stage approach for failure detection in imitation learning-based robotic manipulation. We first distill policy inputs and outputs into scalar signals that correlate with policy failures and capture uncertainty. Our experiments show learned signals to be mostly consistently effective, particularly when using our novel flow-based density estimator.
arXiv Detail & Related papers (2025-03-11T15:47:12Z)
Injecting Explainability and Lightweight Design into Weakly Supervised Video Anomaly Detection Systems [2.0179223501624786]
This paper presents TCVADS (Two-stage Cross-modal Video Anomaly Detection System), which leverages knowledge distillation and cross-modal contrastive learning. Experimental results demonstrate that TCVADS significantly outperforms existing methods in model performance, detection efficiency, and interpretability.
arXiv Detail & Related papers (2024-12-28T16:24:35Z)
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection [56.66677293607114]
We propose Code-as-Monitor (CaM) for both open-set reactive and proactive failure detection. To enhance the accuracy and efficiency of monitoring, we introduce constraint elements that abstract constraint-related entities. Experiments show that CaM achieves a 28.7% higher success rate and reduces execution time by 31.8% under severe disturbances.
arXiv Detail & Related papers (2024-12-05T18:58:27Z)
Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations. We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z)
Real-Time Anomaly Detection and Reactive Planning with Large Language Models [18.57162998677491]
Foundation models, e.g., large language models (LLMs), trained on internet-scale data possess zero-shot capabilities. We present a two-stage reasoning framework that incorporates the judgement regarding potential anomalies into a safe control framework. This enables our monitor to improve the trustworthiness of dynamic robotic systems, such as quadrotors or autonomous vehicles.
arXiv Detail & Related papers (2024-07-11T17:59:22Z)
Learning Recovery Strategies for Dynamic Self-healing in Reactive Systems [1.7218973692320518]
Self-healing systems depend on following a set of predefined instructions to recover from a known failure state. Our proposal targets complex reactive systems, defining monitors as predicates specifying satisfiability conditions of system properties. We use a Reinforcement Learning-based technique to learn a recovery strategy based on users' corrective sequences.
arXiv Detail & Related papers (2024-01-22T23:34:21Z)
Model-Based Runtime Monitoring with Interactive Imitation Learning [30.70994322652745]
This work aims to endow a robot with the ability to monitor and detect errors during task execution. We introduce a model-based runtime monitoring algorithm that learns from deployment data to detect system anomalies and anticipate failures. Our method outperforms the baselines across system-level and unit-test metrics, with 23% and 40% higher success rates in simulation and on physical hardware.
arXiv Detail & Related papers (2023-10-26T16:45:44Z)
PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows. Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z)
Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning. We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z)
Prepare for Trouble and Make it Double. Supervised and Unsupervised Stacking for AnomalyBased Intrusion Detection [4.56877715768796]
We propose the adoption of meta-learning, in the form of a two-layer Stacker, to create a mixed approach that detects both known and unknown threats. It turns out to be more effective in detecting zero-day attacks than supervised algorithms, limiting their main weakness but still maintaining adequate capabilities in detecting known attacks.
arXiv Detail & Related papers (2022-02-28T08:41:32Z)
Tracking the risk of a deployed model and detecting harmful distribution shifts [105.27463615756733]
In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially. We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate.
arXiv Detail & Related papers (2021-10-12T17:21:41Z)
Anomaly Detection in Cybersecurity: Unsupervised, Graph-Based and Supervised Learning Methods in Adversarial Environments [63.942632088208505]
Inherent to today's operating environment is the practice of adversarial machine learning. In this work, we examine the feasibility of unsupervised learning and graph-based methods for anomaly detection. We incorporate a realistic adversarial training mechanism when training our supervised models to enable strong classification performance in adversarial environments.
arXiv Detail & Related papers (2021-05-14T10:05:10Z)
Anomaly Detection in Video via Self-Supervised and Multi-Task Learning [113.81927544121625]
Anomaly detection in video is a challenging computer vision problem. In this paper, we approach anomalous event detection in video through self-supervised and multi-task learning at the object level.
arXiv Detail & Related papers (2020-11-15T10:21:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.