Related papers: Dynamic safety cases for frontier AI

Dynamic safety cases for frontier AI

URL: http://arxiv.org/abs/2412.17618v1
Date: Mon, 23 Dec 2024 14:43:41 GMT
Title: Dynamic safety cases for frontier AI
Authors: Carmen Cârlan, Francesca Gomez, Yohan Mathew, Ketana Krishna, René King, Peter Gebauer, Ben R. Smith,
Abstract summary: This paper proposes a Dynamic Safety Case Management System (DSCMS) to support both the initial creation of a safety case and its systematic, semi-automated revision over time.<n>We demonstrate this approach on a safety case template for offensive cyber capabilities and suggest ways it can be integrated into governance structures for safety-critical decision-making.
Score: 0.7538606213726908
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Frontier artificial intelligence (AI) systems present both benefits and risks to society. Safety cases - structured arguments supported by evidence - are one way to help ensure the safe development and deployment of these systems. Yet the evolving nature of AI capabilities, as well as changes in the operational environment and understanding of risk, necessitates mechanisms for continuously updating these safety cases. Typically, in other sectors, safety cases are produced pre-deployment and do not require frequent updates post-deployment, which can be a manual, costly process. This paper proposes a Dynamic Safety Case Management System (DSCMS) to support both the initial creation of a safety case and its systematic, semi-automated revision over time. Drawing on methods developed in the autonomous vehicles (AV) sector - state-of-the-art Checkable Safety Arguments (CSA) combined with Safety Performance Indicators (SPIs) recommended by UL 4600, a DSCMS helps developers maintain alignment between system safety claims and the latest system state. We demonstrate this approach on a safety case template for offensive cyber capabilities and suggest ways it can be integrated into governance structures for safety-critical decision-making. While the correctness of the initial safety argument remains paramount - particularly for high-severity risks - a DSCMS provides a framework for adapting to new insights and strengthening incident response. We outline challenges and further work towards development and implementation of this approach as part of continuous safety assurance of frontier AI systems.

Related papers

RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards [55.76285458905577]
Large Language Models (LLMs) continue to exhibit vulnerabilities despite deliberate safety alignment efforts.<n>To safeguard against the risk of policy-violating content, system-level moderation via external guard models has emerged as a prevalent mitigation strategy.<n>We propose RSafe, an adaptive reasoning-based safeguard that conducts guided safety reasoning to provide robust protection within the scope of specified safety policies.
arXiv Detail & Related papers (2025-06-09T13:20:04Z)
Towards provable probabilistic safety for scalable embodied AI systems [79.31011047593492]
Embodied AI systems are increasingly prevalent across various applications.<n> Ensuring their safety in complex operating environments remains a major challenge.<n>This Perspective offers a pathway toward safer, large-scale adoption of embodied AI systems in safety-critical applications.
arXiv Detail & Related papers (2025-06-05T15:46:25Z)
SafeAgent: Safeguarding LLM Agents via an Automated Risk Simulator [77.86600052899156]
Large Language Model (LLM)-based agents are increasingly deployed in real-world applications.<n>We propose AutoSafe, the first framework that systematically enhances agent safety through fully automated synthetic data generation.<n>We show that AutoSafe boosts safety scores by 45% on average and achieves a 28.91% improvement on real-world tasks.
arXiv Detail & Related papers (2025-05-23T10:56:06Z)
Shape it Up! Restoring LLM Safety during Finetuning [66.46166656543761]
Finetuning large language models (LLMs) enables user-specific customization but introduces critical safety risks.<n>We propose dynamic safety shaping (DSS), a framework that uses fine-grained safety signals to reinforce learning from safe segments of a response while suppressing unsafe content.<n>We present STAR-DSS, guided by STAR scores, that robustly mitigates finetuning risks and delivers substantial safety improvements across diverse threats, datasets, and model families.
arXiv Detail & Related papers (2025-05-22T18:05:16Z)
An Approach to Technical AGI Safety and Security [72.83728459135101]
We develop an approach to address the risk of harms consequential enough to significantly harm humanity. We focus on technical approaches to misuse and misalignment. We briefly outline how these ingredients could be combined to produce safety cases for AGI systems.
arXiv Detail & Related papers (2025-04-02T15:59:31Z)
Landscape of AI safety concerns -- A methodology to support safety assurance for AI-based autonomous systems [0.0]
AI has emerged as a key technology, driving advancements across a range of applications. The challenge of assuring safety in systems that incorporate AI components is substantial. We propose a novel methodology designed to support the creation of safety assurance cases for AI-based systems.
arXiv Detail & Related papers (2024-12-18T16:38:16Z)
Cross-Modality Safety Alignment [73.8765529028288]
We introduce a novel safety alignment challenge called Safe Inputs but Unsafe Output (SIUO) to evaluate cross-modality safety alignment. To empirically investigate this problem, we developed the SIUO, a cross-modality benchmark encompassing 9 critical safety domains, such as self-harm, illegal activities, and privacy violations. Our findings reveal substantial safety vulnerabilities in both closed- and open-source LVLMs, underscoring the inadequacy of current models to reliably interpret and respond to complex, real-world scenarios.
arXiv Detail & Related papers (2024-06-21T16:14:15Z)
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems [88.80306881112313]
We will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them.
arXiv Detail & Related papers (2024-05-10T17:38:32Z)
The Open Autonomy Safety Case Framework [3.2995359570845917]
Safety cases have become a best practice for measuring, managing, and communicating the safety of autonomous vehicles. This paper introduces the Open Autonomy Safety Case Framework, developed over years of work with the autonomous vehicle industry.
arXiv Detail & Related papers (2024-04-08T12:26:06Z)
ACCESS: Assurance Case Centric Engineering of Safety-critical Systems [9.388301205192082]
Assurance cases are used to communicate and assess confidence in critical system properties such as safety and security. In recent years, model-based system assurance approaches have gained popularity to improve the efficiency and quality of system assurance activities. We show how model-based system assurance cases can trace to heterogeneous engineering artifacts.
arXiv Detail & Related papers (2024-03-22T14:29:50Z)
Deep Learning Safety Concerns in Automated Driving Perception [43.026485214492105]
This paper introduces an additional categorization for a better understanding as well as enabling cross-functional teams to jointly address the concerns. Recent advances in the field of deep learning and impressive performance of deep neural networks (DNNs) for perception have resulted in an increased demand for their use in automated driving (AD) systems.
arXiv Detail & Related papers (2023-09-07T15:25:47Z)
Leveraging Traceability to Integrate Safety Analysis Artifacts into the Software Development Process [51.42800587382228]
Safety assurance cases (SACs) can be challenging to maintain during system evolution. We propose a solution that leverages software traceability to connect relevant system artifacts to safety analysis models. We elicit design rationales for system changes to help safety stakeholders analyze the impact of system changes on safety.
arXiv Detail & Related papers (2023-07-14T16:03:27Z)
Sustainable Adaptive Security [11.574868434725117]
We propose the notion of Sustainable Adaptive Security (SAS) which reflects enduring protection by augmenting adaptive security systems with the capability of mitigating newly discovered threats. We use a smart home example to showcase how we can engineer the activities of the MAPE (Monitor, Analysis, Planning, and Execution) loop of systems satisfying sustainable adaptive security.
arXiv Detail & Related papers (2023-06-05T08:48:36Z)
Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements [76.80453043969209]
This survey presents a framework for safety research pertaining to large models. We begin by introducing safety issues of wide concern, then delve into safety evaluation methods for large models. We explore the strategies for enhancing large model safety from training to deployment.
arXiv Detail & Related papers (2023-02-18T09:32:55Z)
Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers. We then present the pointwise feasibility conditions of the resulting safety controller. We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.