Related papers: Evaluating AI Companies' Frontier Safety Frameworks: Methodology and Results

Evaluating AI Companies' Frontier Safety Frameworks: Methodology and Results

URL: http://arxiv.org/abs/2512.01166v1
Date: Mon, 01 Dec 2025 00:55:18 GMT
Title: Evaluating AI Companies' Frontier Safety Frameworks: Methodology and Results
Authors: Lily Stelling, Malcolm Murray, Simeon Campos, Henry Papadatos,
Abstract summary: 12 AI companies published frontier safety frameworks outlining approaches to managing catastrophic risks from advanced AI systems.<n>We develop a 65-criteria assessment methodology grounded in established risk management principles from safety-critical industries.<n>We evaluate the twelve frameworks across four dimensions: risk identification, risk analysis and evaluation, risk treatment, and risk governance.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Following the Seoul AI Safety Summit in 2024, twelve AI companies published frontier safety frameworks outlining their approaches to managing catastrophic risks from advanced AI systems. These frameworks now serve as a key mechanism for AI risk governance, utilized by regulations and governance instruments such as the EU AI Act's Code of Practice and California's Transparency in Frontier Artificial Intelligence Act. Given their centrality to AI risk management, assessments of such frameworks are warranted. Existing assessments evaluate them at a high level of abstraction and lack granularity on specific practices for companies to adopt. We address this gap by developing a 65-criteria assessment methodology grounded in established risk management principles from safety-critical industries. We evaluate the twelve frameworks across four dimensions: risk identification, risk analysis and evaluation, risk treatment, and risk governance. Companies' current scores are low, ranging from 8% to 35%. By adopting existing best practices already in use across the frameworks, companies could reach 52%. The most critical gaps are nearly universal: companies generally fail to (a) define quantitative risk tolerances, (b) specify capability thresholds for pausing development, and (c) systematically identify unknown risks. To guide improvement, we provide specific recommendations for each company and each criterion.

Related papers

ForesightSafety Bench: A Frontier Risk Evaluation and Governance Framework towards Safe AI [38.70363180741332]
"ForesightSafety Bench" is a safety evaluation framework for cutting-edge AI models.<n>The benchmark has accumulated tens of thousands of structured risk data points and assessment results.<n>Based on this benchmark, we conduct systematic evaluation and in-depth analysis of over twenty mainstream advanced large models.
arXiv Detail & Related papers (2026-02-15T13:12:44Z)
Frontier AI Auditing: Toward Rigorous Third-Party Assessment of Safety and Security Practices at Leading AI Companies [57.521647436515785]
We define frontier AI auditing as rigorous third-party verification of frontier AI developers' safety and security claims.<n>We introduce AI Assurance Levels (AAL-1 to AAL-4), ranging from time-bounded system audits to continuous, deception-resilient verification.
arXiv Detail & Related papers (2026-01-16T18:44:09Z)
International AI Safety Report 2025: Second Key Update: Technical Safeguards and Risk Management [115.92752850425272]
Second update to the 2025 International AI Safety Report assesses new developments in general-purpose AI risk management over the past year.<n> examines how researchers, public institutions, and AI developers are approaching risk management for general-purpose AI.
arXiv Detail & Related papers (2025-11-25T03:12:56Z)
RADAR: A Risk-Aware Dynamic Multi-Agent Framework for LLM Safety Evaluation via Role-Specialized Collaboration [81.38705556267917]
Existing safety evaluation methods for large language models (LLMs) suffer from inherent limitations.<n>We introduce a theoretical framework that reconstructs the underlying risk concept space.<n>We propose RADAR, a multi-agent collaborative evaluation framework.
arXiv Detail & Related papers (2025-09-28T09:35:32Z)
Intolerable Risk Threshold Recommendations for Artificial Intelligence [0.2383122657918106]
Frontier AI models may pose severe risks to public safety, human rights, economic stability, and societal value.<n>Risks could arise from deliberate adversarial misuse, system failures, unintended cascading effects, or simultaneous failures across multiple models.<n>16 global AI industry organizations signed the Frontier AI Safety Commitments, and 27 nations and the EU issued a declaration on their intent to define these thresholds.
arXiv Detail & Related papers (2025-03-04T12:30:37Z)
AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.374792825813394]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability.<n>The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z)
Quantifying Security Vulnerabilities: A Metric-Driven Security Analysis of Gaps in Current AI Standards [5.388550452190688]
This paper audits and quantifies security risks in three major AI governance standards: NIST AI RMF 1.0, UK's AI and Data Protection Risk Toolkit, and the EU's ALTAI.<n>Using a novel risk assessment methodology, we develop four key metrics: Risk Severity Index (RSI), Attack Potential Index (AVPI), Compliance-Security Gap Percentage (CSGP), and Root Cause Vulnerability Score (RCVS)
arXiv Detail & Related papers (2025-02-12T17:57:54Z)
A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management [0.0]
The recent development of powerful AI systems has highlighted the need for robust risk management frameworks.<n>This paper presents a comprehensive risk management framework for the development of frontier AI.
arXiv Detail & Related papers (2025-02-10T16:47:00Z)
Effective Mitigations for Systemic Risks from General-Purpose AI [9.39718128736321]
We surveyed 76 experts whose expertise spans AI safety; critical infrastructure; democratic processes; chemical, biological, radiological, and nuclear risks (CBRN); and discrimination and bias.<n>We find that a broad range of risk mitigation measures are perceived as effective in reducing various systemic risks and technically feasible by domain experts.<n>Three mitigation measures stand out: safety incident reports and security information sharing, third-party pre-deployment model audits, and pre-deployment risk assessments.
arXiv Detail & Related papers (2024-11-14T22:39:25Z)
AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies [80.90138009539004]
AIR-Bench 2024 is the first AI safety benchmark aligned with emerging government regulations and company policies. It decomposes 8 government regulations and 16 company policies into a four-tiered safety taxonomy with granular risk categories in the lowest tier. We evaluate leading language models on AIR-Bench 2024, uncovering insights into their alignment with specified safety concerns.
arXiv Detail & Related papers (2024-07-11T21:16:48Z)
AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies [88.32153122712478]
We identify 314 unique risk categories organized into a four-tiered taxonomy. At the highest level, this taxonomy encompasses System & Operational Risks, Content Safety Risks, Societal Risks, and Legal & Rights Risks. We aim to advance AI safety through information sharing across sectors and the promotion of best practices in risk mitigation for generative AI models and systems.
arXiv Detail & Related papers (2024-06-25T18:13:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.