Can We Detect Failures Without Failure Data? Uncertainty-Aware Runtime Failure Detection for Imitation Learning Policies
- URL: http://arxiv.org/abs/2503.08558v2
- Date: Fri, 25 Apr 2025 07:12:28 GMT
- Title: Can We Detect Failures Without Failure Data? Uncertainty-Aware Runtime Failure Detection for Imitation Learning Policies
- Authors: Chen Xu, Tony Khuong Nguyen, Emma Dixon, Christopher Rodriguez, Patrick Miller, Robert Lee, Paarth Shah, Rares Ambrus, Haruki Nishimura, Masha Itkina,
- Abstract summary: FAIL-Detect is a two-stage approach for failure detection in imitation learning-based robotic manipulation.<n>We first distill policy inputs and outputs into scalar signals that correlate with policy failures and capture uncertainty.<n>Our experiments show learned signals to be mostly consistently effective, particularly when using our novel flow-based density estimator.
- Score: 19.27526590452503
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent years have witnessed impressive robotic manipulation systems driven by advances in imitation learning and generative modeling, such as diffusion- and flow-based approaches. As robot policy performance increases, so does the complexity and time horizon of achievable tasks, inducing unexpected and diverse failure modes that are difficult to predict a priori. To enable trustworthy policy deployment in safety-critical human environments, reliable runtime failure detection becomes important during policy inference. However, most existing failure detection approaches rely on prior knowledge of failure modes and require failure data during training, which imposes a significant challenge in practicality and scalability. In response to these limitations, we present FAIL-Detect, a modular two-stage approach for failure detection in imitation learning-based robotic manipulation. To accurately identify failures from successful training data alone, we frame the problem as sequential out-of-distribution (OOD) detection. We first distill policy inputs and outputs into scalar signals that correlate with policy failures and capture epistemic uncertainty. FAIL-Detect then employs conformal prediction (CP) as a versatile framework for uncertainty quantification with statistical guarantees. Empirically, we thoroughly investigate both learned and post-hoc scalar signal candidates on diverse robotic manipulation tasks. Our experiments show learned signals to be mostly consistently effective, particularly when using our novel flow-based density estimator. Furthermore, our method detects failures more accurately and faster than state-of-the-art (SOTA) failure detection baselines. These results highlight the potential of FAIL-Detect to enhance the safety and reliability of imitation learning-based robotic systems as they progress toward real-world deployment.
Related papers
- Lie Detector: Unified Backdoor Detection via Cross-Examination Framework [68.45399098884364]
We propose a unified backdoor detection framework in the semi-honest setting.
Our method achieves superior detection performance, improving accuracy by 5.4%, 1.6%, and 11.9% over SoTA baselines.
Notably, it is the first to effectively detect backdoors in multimodal large language models.
arXiv Detail & Related papers (2025-03-21T06:12:06Z) - Unpacking Failure Modes of Generative Policies: Runtime Monitoring of Consistency and Progress [31.952925824381325]
We propose a runtime monitoring framework that splits the detection of failures into two complementary categories.
We use Vision Language Models (VLMs) to detect when the policy confidently and consistently takes actions that do not solve the task.
By unifying temporal consistency detection and VLM runtime monitoring, Sentinel detects 18% more failures than using either of the two detectors alone.
arXiv Detail & Related papers (2024-10-06T22:13:30Z) - Fault Detection and Monitoring using a Data-Driven Information-Based Strategy: Method, Theory, and Application [5.056456697289351]
We propose an information-driven fault detection method based on a novel concept drift detector.<n>The method is tailored to identifying drifts in input-output relationships of additive noise models.<n>We prove several theoretical properties of the proposed MI-based fault detection scheme.
arXiv Detail & Related papers (2024-05-06T17:43:39Z) - Predicting Safety Misbehaviours in Autonomous Driving Systems using Uncertainty Quantification [8.213390074932132]
This paper evaluates different uncertainty quantification methods from the deep learning domain for the anticipatory testing of safety-critical misbehaviours.<n>We compute uncertainty scores as the vehicle executes, following the intuition that high uncertainty scores are indicative of unsupported runtime conditions.<n>In our study, we conducted an evaluation of the effectiveness and computational overhead associated with two uncertainty quantification methods, namely MC- Dropout and Deep Ensembles, for misbehaviour avoidance.
arXiv Detail & Related papers (2024-04-29T10:28:28Z) - Model-Based Runtime Monitoring with Interactive Imitation Learning [30.70994322652745]
This work aims to endow a robot with the ability to monitor and detect errors during task execution.
We introduce a model-based runtime monitoring algorithm that learns from deployment data to detect system anomalies and anticipate failures.
Our method outperforms the baselines across system-level and unit-test metrics, with 23% and 40% higher success rates in simulation and on physical hardware.
arXiv Detail & Related papers (2023-10-26T16:45:44Z) - Interactive System-wise Anomaly Detection [66.3766756452743]
Anomaly detection plays a fundamental role in various applications.
It is challenging for existing methods to handle the scenarios where the instances are systems whose characteristics are not readily observed as data.
We develop an end-to-end approach which includes an encoder-decoder module that learns system embeddings.
arXiv Detail & Related papers (2023-04-21T02:20:24Z) - Adaptive Failure Search Using Critical States from Domain Experts [9.93890332477992]
Failure search may be done through logging substantial vehicle miles in either simulation or real world testing.
AST is one such method that poses the problem of failure search as a Markov decision process.
We show that the incorporation of critical states into the AST framework generates failure scenarios with increased safety violations.
arXiv Detail & Related papers (2023-04-01T18:14:41Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - Ranking-Based Physics-Informed Line Failure Detection in Power Grids [66.0797334582536]
Real-time and accurate detecting of potential line failures is the first step to mitigating the extreme weather impact and activating emergency controls.
Power balance equations nonlinearity, increased uncertainty in generation during extreme events, and lack of grid observability compromise the efficiency of traditional data-driven failure detection methods.
This paper proposes a Physics-InformEd Line failure Detector (FIELD) that leverages grid topology information to reduce sample and time complexities and improve localization accuracy.
arXiv Detail & Related papers (2022-08-31T18:19:25Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Adversarial vs behavioural-based defensive AI with joint, continual and
active learning: automated evaluation of robustness to deception, poisoning
and concept drift [62.997667081978825]
Recent advancements in Artificial Intelligence (AI) have brought new capabilities to behavioural analysis (UEBA) for cyber-security.
In this paper, we present a solution to effectively mitigate this attack by improving the detection process and efficiently leveraging human expertise.
arXiv Detail & Related papers (2020-01-13T13:54:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.