A causal model of safety assurance for machine learning
- URL: http://arxiv.org/abs/2201.05451v1
- Date: Fri, 14 Jan 2022 13:54:17 GMT
- Title: A causal model of safety assurance for machine learning
- Authors: Simon Burton
- Abstract summary: This paper proposes a framework based on a causal model of safety upon which effective safety assurance cases for ML-based applications can be built.
The paper defines four categories of safety case evidence and a structured analysis approach within which these evidences can be effectively combined.
- Score: 0.45687771576879593
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes a framework based on a causal model of safety upon which
effective safety assurance cases for ML-based applications can be built. In
doing so, we build upon established principles of safety engineering as well as
previous work on structuring assurance arguments for ML. The paper defines four
categories of safety case evidence and a structured analysis approach within
which these evidences can be effectively combined. Where appropriate, abstract
formalisations of these contributions are used to illustrate the causalities
they evaluate, their contributions to the safety argument and desirable
properties of the evidences. Based on the proposed framework, progress in this
area is re-evaluated and a set of future research directions proposed in order
for tangible progress in this field to be made.
Related papers
- SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law [91.33824439029533]
We introduce SafeWork-R1, a cutting-edge multimodal reasoning model that demonstrates the coevolution of capabilities and safety.<n>It is developed by our proposed SafeLadder framework, which incorporates large-scale, progressive, safety-oriented reinforcement learning post-training.<n>We further develop SafeWork-R1-InternVL3-78B, SafeWork-R1-DeepSeek-70B, and SafeWork-R1-Qwen2.5VL-7B.
arXiv Detail & Related papers (2025-07-24T16:49:19Z) - Measuring What Matters: A Framework for Evaluating Safety Risks in Real-World LLM Applications [0.0]
This paper introduces a practical framework for evaluating application-level safety in large language models (LLMs)<n>We illustrate how the proposed framework was applied in our internal pilot, providing a reference point for organizations seeking to scale their safety testing efforts.
arXiv Detail & Related papers (2025-07-13T22:34:20Z) - Advancing Neural Network Verification through Hierarchical Safety Abstract Interpretation [52.626086874715284]
We introduce a novel problem formulation called Abstract DNN-Verification, which verifies a hierarchical structure of unsafe outputs.<n>By leveraging abstract interpretation and reasoning about output reachable sets, our approach enables assessing multiple safety levels during the formal verification process.<n>Our contributions include a theoretical exploration of the relationship between our novel abstract safety formulation and existing approaches.
arXiv Detail & Related papers (2025-05-08T13:29:46Z) - Advancing Embodied Agent Security: From Safety Benchmarks to Input Moderation [52.83870601473094]
Embodied agents exhibit immense potential across a multitude of domains.
Existing research predominantly concentrates on the security of general large language models.
This paper introduces a novel input moderation framework, meticulously designed to safeguard embodied agents.
arXiv Detail & Related papers (2025-04-22T08:34:35Z) - STAIR: Improving Safety Alignment with Introspective Reasoning [44.780098674618614]
We propose STAIR, a framework that integrates SafeTy Alignment with Itrospective Reasoning.
We show that STAIR effectively mitigates harmful outputs while better preserving helpfulness, compared to instinctive alignment strategies.
With test-time scaling, STAIR achieves a safety performance comparable to Claude-3.5 against popular jailbreak attacks.
arXiv Detail & Related papers (2025-02-04T15:02:55Z) - Safety case template for frontier AI: A cyber inability argument [2.2628353000034065]
We propose a safety case template for offensive cyber capabilities.
We identify a number of risk models, derive proxy tasks from the risk models, define evaluation settings for the proxy tasks, and connect those with evaluation results.
arXiv Detail & Related papers (2024-11-12T18:45:08Z) - SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models [75.67623347512368]
We propose toolns, a comprehensive framework designed for conducting safety evaluations of MLLMs.
Our framework consists of a comprehensive harmful query dataset and an automated evaluation protocol.
Based on our framework, we conducted large-scale experiments on 15 widely-used open-source MLLMs and 6 commercial MLLMs.
arXiv Detail & Related papers (2024-10-24T17:14:40Z) - SafetyAnalyst: Interpretable, transparent, and steerable safety moderation for AI behavior [56.10557932893919]
We present SafetyAnalyst, a novel AI safety moderation framework.
Given an AI behavior, SafetyAnalyst uses chain-of-thought reasoning to analyze its potential consequences.
It aggregates all harmful and beneficial effects into a harmfulness score using fully interpretable weight parameters.
arXiv Detail & Related papers (2024-10-22T03:38:37Z) - What Makes and Breaks Safety Fine-tuning? A Mechanistic Study [64.9691741899956]
Safety fine-tuning helps align Large Language Models (LLMs) with human preferences for their safe deployment.
We design a synthetic data generation framework that captures salient aspects of an unsafe input.
Using this, we investigate three well-known safety fine-tuning methods.
arXiv Detail & Related papers (2024-07-14T16:12:57Z) - Reconciling Safety Measurement and Dynamic Assurance [1.6574413179773757]
We propose a new framework to facilitate dynamic assurance within a safety case approach.
The focus is mainly on the safety architecture, whose underlying risk assessment model gives the concrete link from safety measurement to operational risk.
arXiv Detail & Related papers (2024-05-30T02:48:00Z) - The Art of Defending: A Systematic Evaluation and Analysis of LLM
Defense Strategies on Safety and Over-Defensiveness [56.174255970895466]
Large Language Models (LLMs) play an increasingly pivotal role in natural language processing applications.
This paper presents Safety and Over-Defensiveness Evaluation (SODE) benchmark.
arXiv Detail & Related papers (2023-12-30T17:37:06Z) - Safeguarded Progress in Reinforcement Learning: Safe Bayesian
Exploration for Control Policy Synthesis [63.532413807686524]
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL)
We propose a new architecture that handles the trade-off between efficient progress and safety during exploration.
arXiv Detail & Related papers (2023-12-18T16:09:43Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Integrating Testing and Operation-related Quantitative Evidences in
Assurance Cases to Argue Safety of Data-Driven AI/ML Components [2.064612766965483]
In the future, AI will increasingly find its way into systems that can potentially cause physical harm to humans.
For such safety-critical systems, it must be demonstrated that their residual risk does not exceed what is acceptable.
This paper proposes a more holistic argumentation structure for having achieved the target.
arXiv Detail & Related papers (2022-02-10T20:35:25Z) - Reliability Assessment and Safety Arguments for Machine Learning
Components in Assuring Learning-Enabled Autonomous Systems [19.65793237440738]
We present an overall assurance framework for Learning-Enabled Systems (LES)
We then introduce a novel model-agnostic Reliability Assessment Model (RAM) for ML classifiers.
We discuss the model assumptions and the inherent challenges of assessing ML reliability uncovered by our RAM.
arXiv Detail & Related papers (2021-11-30T14:39:22Z) - Evaluating the Safety of Deep Reinforcement Learning Models using
Semi-Formal Verification [81.32981236437395]
We present a semi-formal verification approach for decision-making tasks based on interval analysis.
Our method obtains comparable results over standard benchmarks with respect to formal verifiers.
Our approach allows to efficiently evaluate safety properties for decision-making models in practical applications.
arXiv Detail & Related papers (2020-10-19T11:18:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.