Related papers: Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

URL: http://arxiv.org/abs/2405.06624v3
Date: Mon, 8 Jul 2024 13:35:00 GMT
Title: Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Authors: David "davidad" Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, Alessandro Abate, Joe Halpern, Clark Barrett, Ding Zhao, Tan Zhi-Xuan, Jeannette Wing, Joshua Tenenbaum,
Abstract summary: We will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them.
Score: 88.80306881112313
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. This is achieved by the interplay of three core components: a world model (which provides a mathematical description of how the AI system affects the outside world), a safety specification (which is a mathematical description of what effects are acceptable), and a verifier (which provides an auditable proof certificate that the AI satisfies the safety specification relative to the world model). We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them. We also argue for the necessity of this approach to AI safety, and for the inadequacy of the main alternative approaches.

Related papers

Frontier AI Auditing: Toward Rigorous Third-Party Assessment of Safety and Security Practices at Leading AI Companies [57.521647436515785]
We define frontier AI auditing as rigorous third-party verification of frontier AI developers' safety and security claims.<n>We introduce AI Assurance Levels (AAL-1 to AAL-4), ranging from time-bounded system audits to continuous, deception-resilient verification.
arXiv Detail & Related papers (2026-01-16T18:44:09Z)
Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance [211.5823259429128]
We propose a comprehensive framework integrating technical and societal dimensions, structured around three interconnected pillars: Intrinsic Security, Derivative Security, and Social Ethics.<n>We identify three core challenges: (1) the generalization gap, where defenses fail against evolving threats; (2) inadequate evaluation protocols that overlook real-world risks; and (3) fragmented regulations leading to inconsistent oversight.<n>Our framework offers actionable guidance for researchers, engineers, and policymakers to develop AI systems that are not only robust and secure but also ethically aligned and publicly trustworthy.
arXiv Detail & Related papers (2025-08-12T09:42:56Z)
The Singapore Consensus on Global AI Safety Research Priorities [128.58674892183657]
"2025 Singapore Conference on AI (SCAI): International Scientific Exchange on AI Safety" aimed to support research in this space.<n>Report builds on the International AI Safety Report chaired by Yoshua Bengio and backed by 33 governments.<n>Report organises AI safety research domains into three types: challenges with creating trustworthy AI systems (Development), challenges with evaluating their risks (Assessment) and challenges with monitoring and intervening after deployment (Control)
arXiv Detail & Related papers (2025-06-25T17:59:50Z)
Towards provable probabilistic safety for scalable embodied AI systems [79.31011047593492]
Embodied AI systems are increasingly prevalent across various applications.<n> Ensuring their safety in complex operating environments remains a major challenge.<n>This Perspective offers a pathway toward safer, large-scale adoption of embodied AI systems in safety-critical applications.
arXiv Detail & Related papers (2025-06-05T15:46:25Z)
Security-First AI: Foundations for Robust and Trustworthy Systems [0.0]
This manuscript posits that AI security must be prioritized as a foundational layer. We argue for a security-first approach to enable trustworthy and resilient AI systems.
arXiv Detail & Related papers (2025-04-17T22:53:01Z)
An Approach to Technical AGI Safety and Security [72.83728459135101]
We develop an approach to address the risk of harms consequential enough to significantly harm humanity. We focus on technical approaches to misuse and misalignment. We briefly outline how these ingredients could be combined to produce safety cases for AGI systems.
arXiv Detail & Related papers (2025-04-02T15:59:31Z)
AI threats to national security can be countered through an incident regime [55.2480439325792]
We propose a legally mandated post-deployment AI incident regime that aims to counter potential national security threats from AI systems. Our proposed AI incident regime is split into three phases. The first phase revolves around a novel operationalization of what counts as an 'AI incident' The second and third phases spell out that AI providers should notify a government agency about incidents, and that the government agency should be involved in amending AI providers' security and safety procedures.
arXiv Detail & Related papers (2025-03-25T17:51:50Z)
The BIG Argument for AI Safety Cases [4.0675753909100445]
The BIG argument adopts a whole-system approach to constructing a safety case for AI systems of varying capability, autonomy and criticality. It is balanced by addressing safety alongside other critical ethical issues such as privacy and equity. It is integrated by bringing together the social, ethical and technical aspects of safety assurance in a way that is traceable and accountable.
arXiv Detail & Related papers (2025-03-12T11:33:28Z)
AI Safety for Everyone [3.440579243843689]
Recent discussions and research in AI safety have increasingly emphasized the deep connection between AI safety and existential risk from advanced AI systems.<n>This framing may exclude researchers and practitioners who are committed to AI safety but approach the field from different angles.<n>We find a vast array of concrete safety work that addresses immediate and practical concerns with current AI systems.
arXiv Detail & Related papers (2025-02-13T13:04:59Z)
AI Safety is Stuck in Technical Terms -- A System Safety Response to the International AI Safety Report [0.0]
Safety has become the central value around which dominant AI governance efforts are being shaped. The report focuses on the safety risks of general-purpose AI and available technical mitigation approaches. The system safety discipline has dealt with the safety risks of software-based systems for many decades.
arXiv Detail & Related papers (2025-02-05T22:37:53Z)
Position: A taxonomy for reporting and describing AI security incidents [57.98317583163334]
We argue that specific are required to describe and report security incidents of AI systems. Existing frameworks for either non-AI security or generic AI safety incident reporting are insufficient to capture the specific properties of AI security.
arXiv Detail & Related papers (2024-12-19T13:50:26Z)
Landscape of AI safety concerns -- A methodology to support safety assurance for AI-based autonomous systems [0.0]
AI has emerged as a key technology, driving advancements across a range of applications. The challenge of assuring safety in systems that incorporate AI components is substantial. We propose a novel methodology designed to support the creation of safety assurance cases for AI-based systems.
arXiv Detail & Related papers (2024-12-18T16:38:16Z)
Engineering Trustworthy AI: A Developer Guide for Empirical Risk Minimization [53.80919781981027]
Key requirements for trustworthy AI can be translated into design choices for the components of empirical risk minimization. We hope to provide actionable guidance for building AI systems that meet emerging standards for trustworthiness of AI.
arXiv Detail & Related papers (2024-10-25T07:53:32Z)
Generative AI Agents in Autonomous Machines: A Safety Perspective [9.02400798202199]
generative AI agents provide unparalleled capabilities, but they also have unique safety concerns. This work investigates the evolving safety requirements when generative models are integrated as agents into physical autonomous machines. We recommend the development and implementation of comprehensive safety scorecards for the use of generative AI technologies in autonomous machines.
arXiv Detail & Related papers (2024-10-20T20:07:08Z)
Trustworthy, Responsible, and Safe AI: A Comprehensive Architectural Framework for AI Safety with Challenges and Mitigations [14.150792596344674]
AI Safety is an emerging area of critical importance to the safe adoption and deployment of AI systems. Our goal is to promote advancement in AI safety research, and ultimately enhance people's trust in digital transformation.
arXiv Detail & Related papers (2024-08-23T09:33:48Z)
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? [59.96471873997733]
We propose an empirical foundation for developing more meaningful safety metrics and define AI safety in a machine learning research context. We aim to provide a more rigorous framework for AI safety research, advancing the science of safety evaluations and clarifying the path towards measurable progress.
arXiv Detail & Related papers (2024-07-31T17:59:24Z)
Cross-Modality Safety Alignment [73.8765529028288]
We introduce a novel safety alignment challenge called Safe Inputs but Unsafe Output (SIUO) to evaluate cross-modality safety alignment. To empirically investigate this problem, we developed the SIUO, a cross-modality benchmark encompassing 9 critical safety domains, such as self-harm, illegal activities, and privacy violations. Our findings reveal substantial safety vulnerabilities in both closed- and open-source LVLMs, underscoring the inadequacy of current models to reliably interpret and respond to complex, real-world scenarios.
arXiv Detail & Related papers (2024-06-21T16:14:15Z)
AI Risk Management Should Incorporate Both Safety and Security [185.68738503122114]
We argue that stakeholders in AI risk management should be aware of the nuances, synergies, and interplay between safety and security. We introduce a unified reference framework to clarify the differences and interplay between AI safety and AI security.
arXiv Detail & Related papers (2024-05-29T21:00:47Z)
Explainable AI for Safe and Trustworthy Autonomous Driving: A Systematic Review [12.38351931894004]
We present the first systematic literature review of explainable methods for safe and trustworthy autonomous driving. We identify five key contributions of XAI for safe and trustworthy AI in AD, which are interpretable design, interpretable surrogate models, interpretable monitoring, auxiliary explanations, and interpretable validation. We propose a modular framework called SafeX to integrate these contributions, enabling explanation delivery to users while simultaneously ensuring the safety of AI models.
arXiv Detail & Related papers (2024-02-08T09:08:44Z)
Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements [76.80453043969209]
This survey presents a framework for safety research pertaining to large models. We begin by introducing safety issues of wide concern, then delve into safety evaluation methods for large models. We explore the strategies for enhancing large model safety from training to deployment.
arXiv Detail & Related papers (2023-02-18T09:32:55Z)
Safe AI -- How is this Possible? [0.45687771576879593]
Traditional safety engineering is coming to a turning point moving from deterministic, non-evolving systems operating in well-defined contexts to increasingly autonomous and learning-enabled AI systems acting in largely unpredictable operating contexts. We outline some of underlying challenges of safe AI and suggest a rigorous engineering framework for minimizing uncertainty, thereby increasing confidence, up to tolerable levels, in the safe behavior of AI systems.
arXiv Detail & Related papers (2022-01-25T16:32:35Z)
AAAI FSS-19: Human-Centered AI: Trustworthiness of AI Models and Data Proceedings [8.445274192818825]
It is crucial for predictive models to be uncertainty-aware and yield trustworthy predictions. The focus of this symposium was on AI systems to improve data quality and technical robustness and safety. submissions from broadly defined areas also discussed approaches addressing requirements such as explainable models, human trust and ethical aspects of AI.
arXiv Detail & Related papers (2020-01-15T15:30:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.