Unpacking Failure Modes of Generative Policies: Runtime Monitoring of Consistency and Progress
- URL: http://arxiv.org/abs/2410.04640v2
- Date: Thu, 10 Oct 2024 17:09:24 GMT
- Title: Unpacking Failure Modes of Generative Policies: Runtime Monitoring of Consistency and Progress
- Authors: Christopher Agia, Rohan Sinha, Jingyun Yang, Zi-ang Cao, Rika Antonova, Marco Pavone, Jeannette Bohg,
- Abstract summary: We propose a runtime monitoring framework that splits the detection of failures into two complementary categories.
We use Vision Language Models (VLMs) to detect when the policy confidently and consistently takes actions that do not solve the task.
By unifying temporal consistency detection and VLM runtime monitoring, Sentinel detects 18% more failures than using either of the two detectors alone.
- Score: 31.952925824381325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robot behavior policies trained via imitation learning are prone to failure under conditions that deviate from their training data. Thus, algorithms that monitor learned policies at test time and provide early warnings of failure are necessary to facilitate scalable deployment. We propose Sentinel, a runtime monitoring framework that splits the detection of failures into two complementary categories: 1) Erratic failures, which we detect using statistical measures of temporal action consistency, and 2) task progression failures, where we use Vision Language Models (VLMs) to detect when the policy confidently and consistently takes actions that do not solve the task. Our approach has two key strengths. First, because learned policies exhibit diverse failure modes, combining complementary detectors leads to significantly higher accuracy at failure detection. Second, using a statistical temporal action consistency measure ensures that we quickly detect when multimodal, generative policies exhibit erratic behavior at negligible computational cost. In contrast, we only use VLMs to detect failure modes that are less time-sensitive. We demonstrate our approach in the context of diffusion policies trained on robotic mobile manipulation domains in both simulation and the real world. By unifying temporal consistency detection and VLM runtime monitoring, Sentinel detects 18% more failures than using either of the two detectors alone and significantly outperforms baselines, thus highlighting the importance of assigning specialized detectors to complementary categories of failure. Qualitative results are made available at https://sites.google.com/stanford.edu/sentinel.
Related papers
- Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations.
We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z) - Real-Time Anomaly Detection and Reactive Planning with Large Language Models [18.57162998677491]
Foundation models, e.g., large language models (LLMs), trained on internet-scale data possess zero-shot capabilities.
We present a two-stage reasoning framework that incorporates the judgement regarding potential anomalies into a safe control framework.
This enables our monitor to improve the trustworthiness of dynamic robotic systems, such as quadrotors or autonomous vehicles.
arXiv Detail & Related papers (2024-07-11T17:59:22Z) - Learning Recovery Strategies for Dynamic Self-healing in Reactive
Systems [1.7218973692320518]
Self-healing systems depend on following a set of predefined instructions to recover from a known failure state.
Our proposal targets complex reactive systems, defining monitors as predicates specifying satisfiability conditions of system properties.
We use a Reinforcement Learning-based technique to learn a recovery strategy based on users' corrective sequences.
arXiv Detail & Related papers (2024-01-22T23:34:21Z) - Model-Based Runtime Monitoring with Interactive Imitation Learning [30.70994322652745]
This work aims to endow a robot with the ability to monitor and detect errors during task execution.
We introduce a model-based runtime monitoring algorithm that learns from deployment data to detect system anomalies and anticipate failures.
Our method outperforms the baselines across system-level and unit-test metrics, with 23% and 40% higher success rates in simulation and on physical hardware.
arXiv Detail & Related papers (2023-10-26T16:45:44Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Prepare for Trouble and Make it Double. Supervised and Unsupervised
Stacking for AnomalyBased Intrusion Detection [4.56877715768796]
We propose the adoption of meta-learning, in the form of a two-layer Stacker, to create a mixed approach that detects both known and unknown threats.
It turns out to be more effective in detecting zero-day attacks than supervised algorithms, limiting their main weakness but still maintaining adequate capabilities in detecting known attacks.
arXiv Detail & Related papers (2022-02-28T08:41:32Z) - Tracking the risk of a deployed model and detecting harmful distribution
shifts [105.27463615756733]
In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially.
We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate.
arXiv Detail & Related papers (2021-10-12T17:21:41Z) - Anomaly Detection in Cybersecurity: Unsupervised, Graph-Based and
Supervised Learning Methods in Adversarial Environments [63.942632088208505]
Inherent to today's operating environment is the practice of adversarial machine learning.
In this work, we examine the feasibility of unsupervised learning and graph-based methods for anomaly detection.
We incorporate a realistic adversarial training mechanism when training our supervised models to enable strong classification performance in adversarial environments.
arXiv Detail & Related papers (2021-05-14T10:05:10Z) - Anomaly Detection in Video via Self-Supervised and Multi-Task Learning [113.81927544121625]
Anomaly detection in video is a challenging computer vision problem.
In this paper, we approach anomalous event detection in video through self-supervised and multi-task learning at the object level.
arXiv Detail & Related papers (2020-11-15T10:21:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.