Risk thresholds for frontier AI
- URL: http://arxiv.org/abs/2406.14713v1
- Date: Thu, 20 Jun 2024 20:16:29 GMT
- Title: Risk thresholds for frontier AI
- Authors: Leonie Koessler, Jonas Schuett, Markus Anderljung,
- Abstract summary: One increasingly popular approach is to define capability thresholds.
Risk thresholds simply state how much risk would be too much.
Main downside is that they are more difficult to evaluate reliably.
- Score: 1.053373860696675
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Frontier artificial intelligence (AI) systems could pose increasing risks to public safety and security. But what level of risk is acceptable? One increasingly popular approach is to define capability thresholds, which describe AI capabilities beyond which an AI system is deemed to pose too much risk. A more direct approach is to define risk thresholds that simply state how much risk would be too much. For instance, they might state that the likelihood of cybercriminals using an AI system to cause X amount of economic damage must not increase by more than Y percentage points. The main upside of risk thresholds is that they are more principled than capability thresholds, but the main downside is that they are more difficult to evaluate reliably. For this reason, we currently recommend that companies (1) define risk thresholds to provide a principled foundation for their decision-making, (2) use these risk thresholds to help set capability thresholds, and then (3) primarily rely on capability thresholds to make their decisions. Regulators should also explore the area because, ultimately, they are the most legitimate actors to define risk thresholds. If AI risk estimates become more reliable, risk thresholds should arguably play an increasingly direct role in decision-making.
Related papers
- Constrained Language Model Policy Optimization via Risk-aware Stepwise Alignment [49.2305683068875]
We propose Risk-aware Stepwise Alignment (RSA), a novel alignment method that incorporates risk awareness into the policy optimization process.<n> RSA mitigates risks induced by excessive model shift away from a reference policy, and it explicitly suppresses low-probability yet high-impact harmful behaviors.<n> Experimental results demonstrate that our method achieves high levels of helpfulness while ensuring strong safety.
arXiv Detail & Related papers (2025-12-30T14:38:02Z) - The Role of Risk Modeling in Advanced AI Risk Management [33.357295564462284]
Rapidly advancing artificial intelligence (AI) systems introduce novel, uncertain, and potentially catastrophic risks.<n>Managing these risks requires a mature risk-management infrastructure whose cornerstone is rigorous risk modeling.<n>We argue that advanced-AI governance should adopt a similar dual approach and that verifiable, provably-safe AI architectures are urgently needed.
arXiv Detail & Related papers (2025-12-09T15:37:33Z) - Can Risk-taking AI-Assistants suitably represent entities [0.0]
This study investigates the manipulability of risk aversion in language models (LMs)<n>It focuses on gender-specific attitudes, uncertainty, role-based decision-making, and the manipulability of risk aversion.<n>The results suggest directions for refining AI design to better align human and AI risk preferences.
arXiv Detail & Related papers (2025-10-09T11:55:31Z) - "We are not Future-ready": Understanding AI Privacy Risks and Existing Mitigation Strategies from the Perspective of AI Developers in Europe [56.1653658714305]
We interviewed 25 AI developers based in Europe to understand which privacy threats they believe pose the greatest risk to users, developers, and businesses.<n>We find that there is little consensus among AI developers on the relative ranking of privacy risks.<n>While AI developers are aware of proposed mitigation strategies for addressing these risks, they reported minimal real-world adoption.
arXiv Detail & Related papers (2025-10-01T13:51:33Z) - An Artificial Intelligence Value at Risk Approach: Metrics and Models [0.0]
The state of the art of artificial intelligence risk management seems to be highly immature due to upcoming AI regulations.<n>The purpose of this paper is to orientate AI stakeholders about the depths of AI risk management.
arXiv Detail & Related papers (2025-09-22T20:27:29Z) - A First-Principles Based Risk Assessment Framework and the IEEE P3396 Standard [0.0]
Generative Artificial Intelligence (AI) is enabling unprecedented automation in content creation and decision support.
This paper presents a first-principles risk assessment framework underlying the IEEE P3396 Recommended Practice for AI Risk, Safety, Trustworthiness, and Responsibility.
arXiv Detail & Related papers (2025-03-31T18:00:03Z) - Intolerable Risk Threshold Recommendations for Artificial Intelligence [0.2383122657918106]
Frontier AI models may pose severe risks to public safety, human rights, economic stability, and societal value.
Risks could arise from deliberate adversarial misuse, system failures, unintended cascading effects, or simultaneous failures across multiple models.
16 global AI industry organizations signed the Frontier AI Safety Commitments, and 27 nations and the EU issued a declaration on their intent to define these thresholds.
arXiv Detail & Related papers (2025-03-04T12:30:37Z) - Answer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language Models [63.559461750135334]
Language models (LMs) are increasingly used to build agents that can act autonomously to achieve goals.<n>We study this "answer-or-defer" problem with an evaluation framework that systematically varies human-specified risk structures.<n>We find that a simple skill-decomposition method, which isolates the independent skills required for answer-or-defer decision making, can consistently improve LMs' decision policies.
arXiv Detail & Related papers (2025-03-03T09:16:26Z) - Fully Autonomous AI Agents Should Not be Developed [58.88624302082713]
This paper argues that fully autonomous AI agents should not be developed.
In support of this position, we build from prior scientific literature and current product marketing to delineate different AI agent levels.
Our analysis reveals that risks to people increase with the autonomy of a system.
arXiv Detail & Related papers (2025-02-04T19:00:06Z) - Engineering Trustworthy AI: A Developer Guide for Empirical Risk Minimization [53.80919781981027]
Key requirements for trustworthy AI can be translated into design choices for the components of empirical risk minimization.
We hope to provide actionable guidance for building AI systems that meet emerging standards for trustworthiness of AI.
arXiv Detail & Related papers (2024-10-25T07:53:32Z) - Risk Alignment in Agentic AI Systems [0.0]
Agentic AIs capable of undertaking complex actions with little supervision raise new questions about how to safely create and align such systems with users, developers, and society.
Risk alignment will matter for user satisfaction and trust, but it will also have important ramifications for society more broadly.
We present three papers that bear on key normative and technical aspects of these questions.
arXiv Detail & Related papers (2024-10-02T18:21:08Z) - The AI Risk Repository: A Comprehensive Meta-Review, Database, and Taxonomy of Risks From Artificial Intelligence [35.77247656798871]
The risks posed by Artificial Intelligence (AI) are of considerable concern to academics, auditors, policymakers, AI companies, and the public.
A lack of shared understanding of AI risks can impede our ability to comprehensively discuss, research, and react to them.
This paper addresses this gap by creating an AI Risk Repository to serve as a common frame of reference.
arXiv Detail & Related papers (2024-08-14T10:32:06Z) - Risks and Opportunities of Open-Source Generative AI [64.86989162783648]
Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education.
The potential for these seismic changes has triggered a lively debate about the potential risks of the technology, and resulted in calls for tighter regulation.
This regulation is likely to put at risk the budding field of open-source generative AI.
arXiv Detail & Related papers (2024-05-14T13:37:36Z) - Near to Mid-term Risks and Opportunities of Open-Source Generative AI [94.06233419171016]
Applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education.
The potential for these seismic changes has triggered a lively debate about potential risks and resulted in calls for tighter regulation.
This regulation is likely to put at risk the budding field of open-source Generative AI.
arXiv Detail & Related papers (2024-04-25T21:14:24Z) - RiskQ: Risk-sensitive Multi-Agent Reinforcement Learning Value Factorization [49.26510528455664]
We introduce the Risk-sensitive Individual-Global-Max (RIGM) principle as a generalization of the Individual-Global-Max (IGM) and Distributional IGM (DIGM) principles.
We show that RiskQ can obtain promising performance through extensive experiments.
arXiv Detail & Related papers (2023-11-03T07:18:36Z) - Managing extreme AI risks amid rapid progress [171.05448842016125]
We describe risks that include large-scale social harms, malicious uses, and irreversible loss of human control over autonomous AI systems.
There is a lack of consensus about how exactly such risks arise, and how to manage them.
Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems.
arXiv Detail & Related papers (2023-10-26T17:59:06Z) - Frontier AI Regulation: Managing Emerging Risks to Public Safety [15.85618115026625]
"Frontier AI" models could possess dangerous capabilities sufficient to pose severe risks to public safety.
Industry self-regulation is an important first step.
We propose an initial set of safety standards.
arXiv Detail & Related papers (2023-07-06T17:03:25Z) - Three lines of defense against risks from AI [0.0]
It is not always clear who is responsible for AI risk management.
The Three Lines of Defense (3LoD) model is considered best practice in many industries.
I suggest ways in which AI companies could implement the model.
arXiv Detail & Related papers (2022-12-16T09:33:00Z) - Quantitative AI Risk Assessments: Opportunities and Challenges [9.262092738841979]
AI-based systems are increasingly being leveraged to provide value to organizations, individuals, and society.
Risks have led to proposed regulations, litigation, and general societal concerns.
This paper explores the concept of a quantitative AI Risk Assessment.
arXiv Detail & Related papers (2022-09-13T21:47:25Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.