Related papers: Affirmative safety: An approach to risk management for high-risk AI

Affirmative safety: An approach to risk management for high-risk AI

URL: http://arxiv.org/abs/2406.15371v1
Date: Sun, 14 Apr 2024 20:48:55 GMT
Title: Affirmative safety: An approach to risk management for high-risk AI
Authors: Akash R. Wasil, Joshua Clymer, David Krueger, Emily Dardaman, Simeon Campos, Evan R. Murphy,
Abstract summary: We argue that entities developing or deploying high-risk AI systems should be required to present evidence of affirmative safety. We propose a risk management approach for advanced AI in which model developers must provide evidence that their activities keep certain risks below regulator-set thresholds.
Score: 6.133009503054252
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Prominent AI experts have suggested that companies developing high-risk AI systems should be required to show that such systems are safe before they can be developed or deployed. The goal of this paper is to expand on this idea and explore its implications for risk management. We argue that entities developing or deploying high-risk AI systems should be required to present evidence of affirmative safety: a proactive case that their activities keep risks below acceptable thresholds. We begin the paper by highlighting global security risks from AI that have been acknowledged by AI experts and world governments. Next, we briefly describe principles of risk management from other high-risk fields (e.g., nuclear safety). Then, we propose a risk management approach for advanced AI in which model developers must provide evidence that their activities keep certain risks below regulator-set thresholds. As a first step toward understanding what affirmative safety cases should include, we illustrate how certain kinds of technical evidence and operational evidence can support an affirmative safety case. In the technical section, we discuss behavioral evidence (evidence about model outputs), cognitive evidence (evidence about model internals), and developmental evidence (evidence about the training process). In the operational section, we offer examples of organizational practices that could contribute to affirmative safety cases: information security practices, safety culture, and emergency response capacity. Finally, we briefly compare our approach to the NIST AI Risk Management Framework. Overall, we hope our work contributes to ongoing discussions about national and global security risks posed by AI and regulatory approaches to address these risks.

Related papers

Manipulation Attacks by Misaligned AI: Risk Analysis and Safety Case Framework [0.0]
Humans are often the weakest link in cybersecurity systems.<n>A misaligned AI system may seek to undermine human oversight by manipulating employees.<n>No systematic framework exists for assessing and mitigating these risks.<n>This paper provides the first systematic methodology for integrating manipulation risk into AI safety governance.
arXiv Detail & Related papers (2025-07-17T07:45:53Z)
Safety Co-Option and Compromised National Security: The Self-Fulfilling Prophecy of Weakened AI Risk Thresholds [0.0]
We show how "safety revisionism" has allowed AI technologists to engage in "safety revisionism" We explore how the current trajectory for AI risk determination and evaluation for foundation model use within national security is poised for a race to the bottom.
arXiv Detail & Related papers (2025-04-21T13:20:56Z)
SAIF: A Comprehensive Framework for Evaluating the Risks of Generative AI in the Public Sector [4.710921988115686]
We propose a Systematic dAta generatIon Framework for evaluating the risks of generative AI (SAIF) SAIF involves four key stages: breaking down risks, designing scenarios, applying jailbreak methods, and exploring prompt types. We believe that this study can play a crucial role in fostering the safe and responsible integration of generative AI into the public sector.
arXiv Detail & Related papers (2025-01-15T14:12:38Z)
OpenAI o1 System Card [274.83891368890977]
The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. This report outlines the safety work carried out for the OpenAI o1 and OpenAI o1-mini models, including safety evaluations, external red teaming, and Preparedness Framework evaluations.
arXiv Detail & Related papers (2024-12-21T18:04:31Z)
Position: A taxonomy for reporting and describing AI security incidents [57.98317583163334]
We argue that specific are required to describe and report security incidents of AI systems. Existing frameworks for either non-AI security or generic AI safety incident reporting are insufficient to capture the specific properties of AI security.
arXiv Detail & Related papers (2024-12-19T13:50:26Z)
Safety case template for frontier AI: A cyber inability argument [2.2628353000034065]
We propose a safety case template for offensive cyber capabilities. We identify a number of risk models, derive proxy tasks from the risk models, define evaluation settings for the proxy tasks, and connect those with evaluation results.
arXiv Detail & Related papers (2024-11-12T18:45:08Z)
Risk Sources and Risk Management Measures in Support of Standards for General-Purpose AI Systems [2.3266896180922187]
We compile an extensive catalog of risk sources and risk management measures for general-purpose AI systems. This work involves identifying technical, operational, and societal risks across model development, training, and deployment stages. The catalog is released under a public domain license for ease of direct use by stakeholders in AI governance and standards.
arXiv Detail & Related papers (2024-10-30T21:32:56Z)
Safeguarding AI Agents: Developing and Analyzing Safety Architectures [0.0]
This paper addresses the need for safety measures in AI systems that collaborate with human teams. We propose and evaluate three frameworks to enhance safety protocols in AI agent systems. We conclude that these frameworks can significantly strengthen the safety and security of AI agent systems.
arXiv Detail & Related papers (2024-09-03T10:14:51Z)
EAIRiskBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [47.69642609574771]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction. Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results. However, the deployment of these agents in physical environments presents significant safety challenges. This study introduces EAIRiskBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z)
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? [59.96471873997733]
We propose an empirical foundation for developing more meaningful safety metrics and define AI safety in a machine learning research context. We aim to provide a more rigorous framework for AI safety research, advancing the science of safety evaluations and clarifying the path towards measurable progress.
arXiv Detail & Related papers (2024-07-31T17:59:24Z)
AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies [88.32153122712478]
We identify 314 unique risk categories organized into a four-tiered taxonomy. At the highest level, this taxonomy encompasses System & Operational Risks, Content Safety Risks, Societal Risks, and Legal & Rights Risks. We aim to advance AI safety through information sharing across sectors and the promotion of best practices in risk mitigation for generative AI models and systems.
arXiv Detail & Related papers (2024-06-25T18:13:05Z)
AI Risk Management Should Incorporate Both Safety and Security [185.68738503122114]
We argue that stakeholders in AI risk management should be aware of the nuances, synergies, and interplay between safety and security. We introduce a unified reference framework to clarify the differences and interplay between AI safety and AI security.
arXiv Detail & Related papers (2024-05-29T21:00:47Z)
Risks of AI Scientists: Prioritizing Safeguarding Over Autonomy [65.77763092833348]
This perspective examines vulnerabilities in AI scientists, shedding light on potential risks associated with their misuse.<n>We take into account user intent, the specific scientific domain, and their potential impact on the external environment.<n>We propose a triadic framework involving human regulation, agent alignment, and an understanding of environmental feedback.
arXiv Detail & Related papers (2024-02-06T18:54:07Z)
Control Risk for Potential Misuse of Artificial Intelligence in Science [85.91232985405554]
We aim to raise awareness of the dangers of AI misuse in science. We highlight real-world examples of misuse in chemical science. We propose a system called SciGuard to control misuse risks for AI models in science.
arXiv Detail & Related papers (2023-12-11T18:50:57Z)
Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements [76.80453043969209]
This survey presents a framework for safety research pertaining to large models. We begin by introducing safety issues of wide concern, then delve into safety evaluation methods for large models. We explore the strategies for enhancing large model safety from training to deployment.
arXiv Detail & Related papers (2023-02-18T09:32:55Z)
Quantitative AI Risk Assessments: Opportunities and Challenges [9.262092738841979]
AI-based systems are increasingly being leveraged to provide value to organizations, individuals, and society. Risks have led to proposed regulations, litigation, and general societal concerns. This paper explores the concept of a quantitative AI Risk Assessment.
arXiv Detail & Related papers (2022-09-13T21:47:25Z)
Actionable Guidance for High-Consequence AI Risk Management: Towards Standards Addressing AI Catastrophic Risks [12.927021288925099]
Artificial intelligence (AI) systems can present risks of events with very high or catastrophic consequences at societal scale. NIST is developing the NIST Artificial Intelligence Risk Management Framework (AI RMF) as voluntary guidance on AI risk assessment and management. We provide detailed actionable-guidance recommendations focused on identifying and managing risks of events with very high or catastrophic consequences.
arXiv Detail & Related papers (2022-06-17T18:40:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.