Sample-Efficient Safety Assurances using Conformal Prediction
- URL: http://arxiv.org/abs/2109.14082v5
- Date: Tue, 2 Jan 2024 18:23:59 GMT
- Title: Sample-Efficient Safety Assurances using Conformal Prediction
- Authors: Rachel Luo, Shengjia Zhao, Jonathan Kuck, Boris Ivanovic, Silvio
Savarese, Edward Schmerling, Marco Pavone
- Abstract summary: Early warning systems can provide alerts when an unsafe situation is imminent.
To reliably improve safety, these warning systems should have a provable false negative rate.
We present a framework that combines a statistical inference technique known as conformal prediction with a simulator of robot/environment dynamics.
- Score: 57.92013073974406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When deploying machine learning models in high-stakes robotics applications,
the ability to detect unsafe situations is crucial. Early warning systems can
provide alerts when an unsafe situation is imminent (in the absence of
corrective action). To reliably improve safety, these warning systems should
have a provable false negative rate; i.e. of the situations that are unsafe,
fewer than $\epsilon$ will occur without an alert. In this work, we present a
framework that combines a statistical inference technique known as conformal
prediction with a simulator of robot/environment dynamics, in order to tune
warning systems to provably achieve an $\epsilon$ false negative rate using as
few as $1/\epsilon$ data points. We apply our framework to a driver warning
system and a robotic grasping application, and empirically demonstrate
guaranteed false negative rate while also observing low false detection
(positive) rate.
Related papers
- Learning to Stay Safe: Adaptive Regularization Against Safety Degradation during Fine-Tuning [2.9184958249079975]
Existing defenses offer limited protection or force a trade-off between safety and utility.<n>We introduce a training framework that adapts regularization in response to safety risk.<n>We empirically verify that harmful intent signals are predictable from pre-generation activations.
arXiv Detail & Related papers (2026-02-19T16:59:54Z) - Detecting Object Tracking Failure via Sequential Hypothesis Testing [80.7891291021747]
Real-time online object tracking in videos constitutes a core task in computer vision.<n>We propose interpreting object tracking as a sequential hypothesis test, wherein evidence for or against tracking failures is gradually accumulated over time.<n>We propose both supervised and unsupervised variants by leveraging either ground-truth or solely internal tracking information.
arXiv Detail & Related papers (2026-02-13T14:57:15Z) - Steering Externalities: Benign Activation Steering Unintentionally Increases Jailbreak Risk for Large Language Models [62.16655896700062]
Activation steering is a technique to enhance the utility of Large Language Models (LLMs)<n>We show that it unintentionally introduces critical and under-explored safety risks.<n>Experiments reveal that these interventions act as a force multiplier, creating new vulnerabilities to jailbreaks and increasing attack success rates to over 80% on standard benchmarks.
arXiv Detail & Related papers (2026-02-03T12:32:35Z) - Accident Anticipation via Temporal Occurrence Prediction [15.813749445439292]
Accident anticipation aims to predict potential collisions in an online manner, enabling timely alerts to enhance road safety.<n>Existing methods typically predict frame-level risk scores as indicators of hazard.<n>We propose a novel paradigm that shifts the prediction target from current-frame risk scoring to directly estimating accident scores at multiple future time steps.
arXiv Detail & Related papers (2025-10-25T11:57:22Z) - Safety Monitoring for Learning-Enabled Cyber-Physical Systems in Out-of-Distribution Scenarios [17.629563106665557]
We propose to directly monitor safety in a manner that is itself robust to OOD data.
Our safety monitor additionally uses a novel combination of adaptive conformal prediction and incremental learning.
arXiv Detail & Related papers (2025-04-18T05:42:37Z) - Coverage-Guaranteed Speech Emotion Recognition via Calibrated Uncertainty-Adaptive Prediction Sets [0.0]
Road rage, often triggered by emotional suppression and sudden outbursts, significantly threatens road safety by causing collisions and aggressive behavior.<n>Speech emotion recognition technologies can mitigate this risk by identifying negative emotions early and issuing timely alerts.<n>We propose a novel risk-controlled prediction framework providing statistically rigorous guarantees on prediction accuracy.
arXiv Detail & Related papers (2025-03-24T12:26:28Z) - Safe Vision-Language Models via Unsafe Weights Manipulation [75.04426753720551]
We revise safety evaluation by introducing Safe-Ground, a new set of metrics that evaluate safety at different levels of granularity.
We take a different direction and explore whether it is possible to make a model safer without training, introducing Unsafe Weights Manipulation (UWM)
UWM uses a calibration set of safe and unsafe instances to compare activations between safe and unsafe content, identifying the most important parameters for processing the latter.
arXiv Detail & Related papers (2025-03-14T17:00:22Z) - Anomalous State Sequence Modeling to Enhance Safety in Reinforcement Learning [0.0]
We propose a safe reinforcement learning (RL) approach that utilizes an anomalous state sequence to enhance RL safety.
In experiments on multiple safety-critical environments including self-driving cars, our solution approach successfully learns safer policies.
arXiv Detail & Related papers (2024-07-29T10:30:07Z) - What Makes and Breaks Safety Fine-tuning? A Mechanistic Study [64.9691741899956]
Safety fine-tuning helps align Large Language Models (LLMs) with human preferences for their safe deployment.
We design a synthetic data generation framework that captures salient aspects of an unsafe input.
Using this, we investigate three well-known safety fine-tuning methods.
arXiv Detail & Related papers (2024-07-14T16:12:57Z) - Safety Margins for Reinforcement Learning [53.10194953873209]
We show how to leverage proxy criticality metrics to generate safety margins.
We evaluate our approach on learned policies from APE-X and A3C within an Atari environment.
arXiv Detail & Related papers (2023-07-25T16:49:54Z) - Safe Deep Reinforcement Learning by Verifying Task-Level Properties [84.64203221849648]
Cost functions are commonly employed in Safe Deep Reinforcement Learning (DRL)
The cost is typically encoded as an indicator function due to the difficulty of quantifying the risk of policy decisions in the state space.
In this paper, we investigate an alternative approach that uses domain knowledge to quantify the risk in the proximity of such states by defining a violation metric.
arXiv Detail & Related papers (2023-02-20T15:24:06Z) - Online Distribution Shift Detection via Recency Prediction [43.84609690251748]
We present an online method for detecting distribution shift with guarantees on the false positive rate.
Our system is very unlikely (with probability $ epsilon$) to falsely issue an alert when there is no distribution shift.
It empirically achieves up to 11x faster detection on realistic robotics settings compared to prior work.
arXiv Detail & Related papers (2022-11-17T22:29:58Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Safe Reinforcement Learning by Imagining the Near Future [37.0376099401243]
In this work, we focus on the setting where unsafe states can be avoided by planning ahead a short time into the future.
We devise a model-based algorithm that heavily penalizes unsafe trajectories, and derive guarantees that our algorithm can avoid unsafe states under certain assumptions.
Experiments demonstrate that our algorithm can achieve competitive rewards with fewer safety violations in several continuous control tasks.
arXiv Detail & Related papers (2022-02-15T23:28:24Z) - ProBF: Learning Probabilistic Safety Certificates with Barrier Functions [31.203344483485843]
The control barrier function is a useful tool to guarantee safety if we have access to the ground-truth system dynamics.
In practice, we have inaccurate knowledge of the system dynamics, which can lead to unsafe behaviors.
We show the efficacy of this method through experiments on Segway and Quadrotor simulations.
arXiv Detail & Related papers (2021-12-22T20:18:18Z) - Tracking the risk of a deployed model and detecting harmful distribution
shifts [105.27463615756733]
In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially.
We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate.
arXiv Detail & Related papers (2021-10-12T17:21:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.