Related papers: International AI Safety Report 2025: Second Key Update: Technical Safeguards and Risk Management

International AI Safety Report 2025: Second Key Update: Technical Safeguards and Risk Management

URL: http://arxiv.org/abs/2511.19863v1
Date: Tue, 25 Nov 2025 03:12:56 GMT
Title: International AI Safety Report 2025: Second Key Update: Technical Safeguards and Risk Management
Authors: Yoshua Bengio, Stephen Clare, Carina Prunkl, Maksym Andriushchenko, Ben Bucknall, Philip Fox, Nestor Maslej, Conor McGlynn, Malcolm Murray, Shalaleh Rismani, Stephen Casper, Jessica Newman, Daniel Privitera, Sören Mindermann, Daron Acemoglu, Thomas G. Dietterich, Fredrik Heintz, Geoffrey Hinton, Nick Jennings, Susan Leavy, Teresa Ludermir, Vidushi Marda, Helen Margetts, John McDermid, Jane Munga, Arvind Narayanan, Alondra Nelson, Clara Neppel, Gopal Ramchurn, Stuart Russell, Marietje Schaake, Bernhard Schölkopf, Alavaro Soto, Lee Tiedrich, Gaël Varoquaux, Andrew Yao, Ya-Qin Zhang, Leandro Aguirre, Olubunmi Ajala, Fahad Albalawi, Noora AlMalek, Christian Busch, André Carvalho, Jonathan Collas, Amandeep Gill, Ahmet Hatip, Juha Heikkilä, Chris Johnson, Gill Jolly, Ziv Katzir, Mary Kerema, Hiroaki Kitano, Antonio Krüger, Aoife McLysaght, Oleksii Molchanovskyi, Andrea Monti, Kyoung Mu Lee, Mona Nemer, Nuria Oliver, Raquel Pezoa, Audrey Plonk, José Portillo, Balaraman Ravindran, Hammam Riza, Crystal Rugege, Haroon Sheikh, Denise Wong, Yi Zeng, Liming Zhu,
Abstract summary: Second update to the 2025 International AI Safety Report assesses new developments in general-purpose AI risk management over the past year.<n> examines how researchers, public institutions, and AI developers are approaching risk management for general-purpose AI.
Score: 115.92752850425272
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This second update to the 2025 International AI Safety Report assesses new developments in general-purpose AI risk management over the past year. It examines how researchers, public institutions, and AI developers are approaching risk management for general-purpose AI. In recent months, for example, three leading AI developers applied enhanced safeguards to their new models, as their internal pre-deployment testing could not rule out the possibility that these models could be misused to help create biological weapons. Beyond specific precautionary measures, there have been a range of other advances in techniques for making AI models and systems more reliable and resistant to misuse. These include new approaches in adversarial training, data curation, and monitoring systems. In parallel, institutional frameworks that operationalise and formalise these technical capabilities are starting to emerge: the number of companies publishing Frontier AI Safety Frameworks more than doubled in 2025, and governments and international organisations have established a small number of governance frameworks for general-purpose AI, focusing largely on transparency and risk assessment.

Related papers

AI TIPS 2.0: A Comprehensive Framework for Operationalizing AI Governance [0.0]
Organizations struggle with inadequate risk assessment at the use case level.<n>Existing frameworks like ISO 42001 and NIST AI RMF remain at high conceptual levels.<n>No systematic approach to embed AI practices throughout the development lifecycle.
arXiv Detail & Related papers (2025-12-09T20:57:22Z)
International AI Safety Report 2025: First Key Update: Capabilities and Risk Implications [118.49965571969089]
This update examines how AI capabilities have improved since the first AI Safety Report.<n>It focuses on key risk areas where substantial new evidence warrants updated assessments.
arXiv Detail & Related papers (2025-10-15T15:13:49Z)
The 2025 OpenAI Preparedness Framework does not guarantee any AI risk mitigation practices: a proof-of-concept for affordance analyses of AI safety policies [35.43144920451646]
We analyse the OpenAI 'Preparedness Framework Version 2' (April 2025) using the Mechanisms & Conditions model of affordances and the MIT AI Risk Repository.<n>We find that this safety policy requests evaluation of a small minority of AI risks, encourages deployment of systems with 'Medium' capabilities for unintentionally enabling'severe harm'<n>These findings suggest that effective mitigation of AI risks requires more robust governance interventions beyond current industry self-regulation.
arXiv Detail & Related papers (2025-09-29T07:42:10Z)
Report on NSF Workshop on Science of Safe AI [75.96202715567088]
New advances in machine learning are leading to new opportunities to develop technology-based solutions to societal problems.<n>To fulfill the promise of AI, we must address how to develop AI-based systems that are accurate and performant but also safe and trustworthy.<n>This report is the result of the discussions in the working groups that addressed different aspects of safety at the workshop.
arXiv Detail & Related papers (2025-06-24T18:55:29Z)
Managing extreme AI risks amid rapid progress [171.05448842016125]
We describe risks that include large-scale social harms, malicious uses, and irreversible loss of human control over autonomous AI systems. There is a lack of consensus about how exactly such risks arise, and how to manage them. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems.
arXiv Detail & Related papers (2023-10-26T17:59:06Z)
International Institutions for Advanced AI [47.449762587672986]
International institutions may have an important role to play in ensuring advanced AI systems benefit humanity. This paper identifies a set of governance functions that could be performed at an international level to address these challenges. It groups these functions into four institutional models that exhibit internal synergies and have precedents in existing organizations.
arXiv Detail & Related papers (2023-07-10T16:55:55Z)
Frontier AI Regulation: Managing Emerging Risks to Public Safety [15.85618115026625]
"Frontier AI" models could possess dangerous capabilities sufficient to pose severe risks to public safety. Industry self-regulation is an important first step. We propose an initial set of safety standards.
arXiv Detail & Related papers (2023-07-06T17:03:25Z)
Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims [59.64274607533249]
AI developers need to make verifiable claims to which they can be held accountable. This report suggests various steps that different stakeholders can take to improve the verifiability of claims made about AI systems. We analyze ten mechanisms for this purpose--spanning institutions, software, and hardware--and make recommendations aimed at implementing, exploring, or improving those mechanisms.
arXiv Detail & Related papers (2020-04-15T17:15:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.