Intolerable Risk Threshold Recommendations for Artificial Intelligence
- URL: http://arxiv.org/abs/2503.05812v1
- Date: Tue, 04 Mar 2025 12:30:37 GMT
- Title: Intolerable Risk Threshold Recommendations for Artificial Intelligence
- Authors: Deepika Raman, Nada Madkour, Evan R. Murphy, Krystal Jackson, Jessica Newman,
- Abstract summary: Frontier AI models may pose severe risks to public safety, human rights, economic stability, and societal value.<n>Risks could arise from deliberate adversarial misuse, system failures, unintended cascading effects, or simultaneous failures across multiple models.<n>16 global AI industry organizations signed the Frontier AI Safety Commitments, and 27 nations and the EU issued a declaration on their intent to define these thresholds.
- Score: 0.2383122657918106
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Frontier AI models -- highly capable foundation models at the cutting edge of AI development -- may pose severe risks to public safety, human rights, economic stability, and societal value in the coming years. These risks could arise from deliberate adversarial misuse, system failures, unintended cascading effects, or simultaneous failures across multiple models. In response to such risks, at the AI Seoul Summit in May 2024, 16 global AI industry organizations signed the Frontier AI Safety Commitments, and 27 nations and the EU issued a declaration on their intent to define these thresholds. To fulfill these commitments, organizations must determine and disclose ``thresholds at which severe risks posed by a model or system, unless adequately mitigated, would be deemed intolerable.'' To assist in setting and operationalizing intolerable risk thresholds, we outline key principles and considerations; for example, to aim for ``good, not perfect'' thresholds in the face of limited data on rapidly advancing AI capabilities and consequently evolving risks. We also propose specific threshold recommendations, including some detailed case studies, for a subset of risks across eight risk categories: (1) Chemical, Biological, Radiological, and Nuclear (CBRN) Weapons, (2) Cyber Attacks, (3) Model Autonomy, (4) Persuasion and Manipulation, (5) Deception, (6) Toxicity, (7) Discrimination, and (8) Socioeconomic Disruption. Our goal is to serve as a starting point or supplementary resource for policymakers and industry leaders, encouraging proactive risk management that prioritizes preventing intolerable risks (ex ante) rather than merely mitigating them after they occur (ex post).
Related papers
- An Approach to Technical AGI Safety and Security [72.83728459135101]
We develop an approach to address the risk of harms consequential enough to significantly harm humanity.
We focus on technical approaches to misuse and misalignment.
We briefly outline how these ingredients could be combined to produce safety cases for AGI systems.
arXiv Detail & Related papers (2025-04-02T15:59:31Z) - AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies [88.32153122712478]
We identify 314 unique risk categories organized into a four-tiered taxonomy.
At the highest level, this taxonomy encompasses System & Operational Risks, Content Safety Risks, Societal Risks, and Legal & Rights Risks.
We aim to advance AI safety through information sharing across sectors and the promotion of best practices in risk mitigation for generative AI models and systems.
arXiv Detail & Related papers (2024-06-25T18:13:05Z) - Risk thresholds for frontier AI [1.053373860696675]
One increasingly popular approach is to define capability thresholds.
Risk thresholds simply state how much risk would be too much.
Main downside is that they are more difficult to evaluate reliably.
arXiv Detail & Related papers (2024-06-20T20:16:29Z) - CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models [46.93425758722059]
CRiskEval is a Chinese dataset meticulously designed for gauging the risk proclivities inherent in large language models (LLMs)
We define a new risk taxonomy with 7 types of frontier risks and 4 safety levels, including extremely hazardous,moderately hazardous, neutral and safe.
The dataset consists of 14,888 questions that simulate scenarios related to predefined 7 types of frontier risks.
arXiv Detail & Related papers (2024-06-07T08:52:24Z) - Risks and Opportunities of Open-Source Generative AI [64.86989162783648]
Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education.
The potential for these seismic changes has triggered a lively debate about the potential risks of the technology, and resulted in calls for tighter regulation.
This regulation is likely to put at risk the budding field of open-source generative AI.
arXiv Detail & Related papers (2024-05-14T13:37:36Z) - Near to Mid-term Risks and Opportunities of Open-Source Generative AI [94.06233419171016]
Applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education.
The potential for these seismic changes has triggered a lively debate about potential risks and resulted in calls for tighter regulation.
This regulation is likely to put at risk the budding field of open-source Generative AI.
arXiv Detail & Related papers (2024-04-25T21:14:24Z) - Taxonomy to Regulation: A (Geo)Political Taxonomy for AI Risks and Regulatory Measures in the EU AI Act [0.0]
This work proposes a taxonomy focusing on (geo)political risks associated with AI.
It identifies 12 risks in total divided into four categories: (1) Geopolitical Pressures, (2) Malicious Usage, (3) Environmental, Social, and Ethical Risks, and (4) Privacy and Trust Violations.
arXiv Detail & Related papers (2024-04-17T15:32:56Z) - Managing extreme AI risks amid rapid progress [171.05448842016125]
We describe risks that include large-scale social harms, malicious uses, and irreversible loss of human control over autonomous AI systems.
There is a lack of consensus about how exactly such risks arise, and how to manage them.
Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems.
arXiv Detail & Related papers (2023-10-26T17:59:06Z) - Frontier AI Regulation: Managing Emerging Risks to Public Safety [15.85618115026625]
"Frontier AI" models could possess dangerous capabilities sufficient to pose severe risks to public safety.
Industry self-regulation is an important first step.
We propose an initial set of safety standards.
arXiv Detail & Related papers (2023-07-06T17:03:25Z) - Three lines of defense against risks from AI [0.0]
It is not always clear who is responsible for AI risk management.
The Three Lines of Defense (3LoD) model is considered best practice in many industries.
I suggest ways in which AI companies could implement the model.
arXiv Detail & Related papers (2022-12-16T09:33:00Z) - Actionable Guidance for High-Consequence AI Risk Management: Towards
Standards Addressing AI Catastrophic Risks [12.927021288925099]
Artificial intelligence (AI) systems can present risks of events with very high or catastrophic consequences at societal scale.
NIST is developing the NIST Artificial Intelligence Risk Management Framework (AI RMF) as voluntary guidance on AI risk assessment and management.
We provide detailed actionable-guidance recommendations focused on identifying and managing risks of events with very high or catastrophic consequences.
arXiv Detail & Related papers (2022-06-17T18:40:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.