A causal model of safety assurance for machine learning
- URL: http://arxiv.org/abs/2201.05451v1
- Date: Fri, 14 Jan 2022 13:54:17 GMT
- Title: A causal model of safety assurance for machine learning
- Authors: Simon Burton
- Abstract summary: This paper proposes a framework based on a causal model of safety upon which effective safety assurance cases for ML-based applications can be built.
The paper defines four categories of safety case evidence and a structured analysis approach within which these evidences can be effectively combined.
- Score: 0.45687771576879593
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes a framework based on a causal model of safety upon which
effective safety assurance cases for ML-based applications can be built. In
doing so, we build upon established principles of safety engineering as well as
previous work on structuring assurance arguments for ML. The paper defines four
categories of safety case evidence and a structured analysis approach within
which these evidences can be effectively combined. Where appropriate, abstract
formalisations of these contributions are used to illustrate the causalities
they evaluate, their contributions to the safety argument and desirable
properties of the evidences. Based on the proposed framework, progress in this
area is re-evaluated and a set of future research directions proposed in order
for tangible progress in this field to be made.
Related papers
- What Makes and Breaks Safety Fine-tuning? A Mechanistic Study [64.9691741899956]
Safety fine-tuning helps align Large Language Models (LLMs) with human preferences for their safe deployment.
We design a synthetic data generation framework that captures salient aspects of an unsafe input.
Using this, we investigate three well-known safety fine-tuning methods.
arXiv Detail & Related papers (2024-07-14T16:12:57Z) - Reconciling Safety Measurement and Dynamic Assurance [1.6574413179773757]
We propose a new framework to facilitate dynamic assurance within a safety case approach.
The focus is mainly on the safety architecture, whose underlying risk assessment model gives the concrete link from safety measurement to operational risk.
arXiv Detail & Related papers (2024-05-30T02:48:00Z) - The Art of Defending: A Systematic Evaluation and Analysis of LLM
Defense Strategies on Safety and Over-Defensiveness [56.174255970895466]
Large Language Models (LLMs) play an increasingly pivotal role in natural language processing applications.
This paper presents Safety and Over-Defensiveness Evaluation (SODE) benchmark.
arXiv Detail & Related papers (2023-12-30T17:37:06Z) - Safeguarded Progress in Reinforcement Learning: Safe Bayesian
Exploration for Control Policy Synthesis [63.532413807686524]
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL)
We propose a new architecture that handles the trade-off between efficient progress and safety during exploration.
arXiv Detail & Related papers (2023-12-18T16:09:43Z) - Towards Safer Generative Language Models: A Survey on Safety Risks,
Evaluations, and Improvements [76.80453043969209]
This survey presents a framework for safety research pertaining to large models.
We begin by introducing safety issues of wide concern, then delve into safety evaluation methods for large models.
We explore the strategies for enhancing large model safety from training to deployment.
arXiv Detail & Related papers (2023-02-18T09:32:55Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Integrating Testing and Operation-related Quantitative Evidences in
Assurance Cases to Argue Safety of Data-Driven AI/ML Components [2.064612766965483]
In the future, AI will increasingly find its way into systems that can potentially cause physical harm to humans.
For such safety-critical systems, it must be demonstrated that their residual risk does not exceed what is acceptable.
This paper proposes a more holistic argumentation structure for having achieved the target.
arXiv Detail & Related papers (2022-02-10T20:35:25Z) - Reliability Assessment and Safety Arguments for Machine Learning
Components in Assuring Learning-Enabled Autonomous Systems [19.65793237440738]
We present an overall assurance framework for Learning-Enabled Systems (LES)
We then introduce a novel model-agnostic Reliability Assessment Model (RAM) for ML classifiers.
We discuss the model assumptions and the inherent challenges of assessing ML reliability uncovered by our RAM.
arXiv Detail & Related papers (2021-11-30T14:39:22Z) - The missing link: Developing a safety case for perception components in
automated driving [10.43163823170716]
Perception is a key aspect of automated driving systems (AD) that relies heavily on Machine Learning (ML)
Despite the known challenges with the safety assurance of ML-based components, proposals have recently emerged for unit-level safety cases addressing these components.
We propose a generic template for such a linking argument specifically tailored for perception components.
arXiv Detail & Related papers (2021-08-30T15:12:27Z) - Evaluating the Safety of Deep Reinforcement Learning Models using
Semi-Formal Verification [81.32981236437395]
We present a semi-formal verification approach for decision-making tasks based on interval analysis.
Our method obtains comparable results over standard benchmarks with respect to formal verifiers.
Our approach allows to efficiently evaluate safety properties for decision-making models in practical applications.
arXiv Detail & Related papers (2020-10-19T11:18:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.