Related papers: Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5

Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5

URL: http://arxiv.org/abs/2602.14457v1
Date: Mon, 16 Feb 2026 04:30:06 GMT
Title: Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5
Authors: Dongrui Liu, Yi Yu, Jie Zhang, Guanxu Chen, Qihao Lin, Hanxi Zhu, Lige Huang, Yijin Zhou, Peng Wang, Shuai Shao, Boxuan Zhang, Zicheng Liu, Jingwei Sun, Yu Li, Yuejin Xie, Jiaxuan Guo, Jia Xu, Chaochao Lu, Bowen Zhou, Xia Hu, Jing Shao,
Abstract summary: This technical report presents an updated and granular assessment of five critical dimensions: cyber offense, persuasion and manipulation, strategic deception, uncontrolled AI R&D, and self-replication.<n>This work reflects our current understanding of AI frontier risks and urges collective action to mitigate these challenges.
Score: 61.787178868669265
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To understand and identify the unprecedented risks posed by rapidly advancing artificial intelligence (AI) models, Frontier AI Risk Management Framework in Practice presents a comprehensive assessment of their frontier risks. As Large Language Models (LLMs) general capabilities rapidly evolve and the proliferation of agentic AI, this version of the risk analysis technical report presents an updated and granular assessment of five critical dimensions: cyber offense, persuasion and manipulation, strategic deception, uncontrolled AI R\&D, and self-replication. Specifically, we introduce more complex scenarios for cyber offense. For persuasion and manipulation, we evaluate the risk of LLM-to-LLM persuasion on newly released LLMs. For strategic deception and scheming, we add the new experiment with respect to emergent misalignment. For uncontrolled AI R\&D, we focus on the ``mis-evolution'' of agents as they autonomously expand their memory substrates and toolsets. Besides, we also monitor and evaluate the safety performance of OpenClaw during the interaction on the Moltbook. For self-replication, we introduce a new resource-constrained scenario. More importantly, we propose and validate a series of robust mitigation strategies to address these emerging threats, providing a preliminary technical and actionable pathway for the secure deployment of frontier AI. This work reflects our current understanding of AI frontier risks and urges collective action to mitigate these challenges.

Related papers

Toward Risk Thresholds for AI-Enabled Cyber Threats: Enhancing Decision-Making Under Uncertainty with Bayesian Networks [0.3151064009829256]
We propose a structured approach to developing and evaluating AI cyber risk thresholds.<n>First, we analyze existing industry cyber thresholds and identify common threshold elements.<n>Second, we propose the use of Bayesian networks as a tool for modeling AI-enabled cyber risk.
arXiv Detail & Related papers (2026-01-23T23:23:12Z)
Toward Quantitative Modeling of Cybersecurity Risks Due to AI Misuse [50.87630846876635]
We develop nine detailed cyber risk models.<n>Each model decomposes attacks into steps using the MITRE ATT&CK framework.<n>Individual estimates are aggregated through Monte Carlo simulation.
arXiv Detail & Related papers (2025-12-09T17:54:17Z)
"We are not Future-ready": Understanding AI Privacy Risks and Existing Mitigation Strategies from the Perspective of AI Developers in Europe [56.1653658714305]
We interviewed 25 AI developers based in Europe to understand which privacy threats they believe pose the greatest risk to users, developers, and businesses.<n>We find that there is little consensus among AI developers on the relative ranking of privacy risks.<n>While AI developers are aware of proposed mitigation strategies for addressing these risks, they reported minimal real-world adoption.
arXiv Detail & Related papers (2025-10-01T13:51:33Z)
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report [51.17413460785022]
This report presents a comprehensive assessment of their frontier risks.<n>We identify critical risks in seven areas: cyber offense, biological and chemical risks, persuasion and manipulation, uncontrolled autonomous AI R&D, strategic deception and scheming, self-replication, and collusion.
arXiv Detail & Related papers (2025-07-22T12:44:38Z)
Manipulation Attacks by Misaligned AI: Risk Analysis and Safety Case Framework [0.0]
Humans are often the weakest link in cybersecurity systems.<n>A misaligned AI system may seek to undermine human oversight by manipulating employees.<n>No systematic framework exists for assessing and mitigating these risks.<n>This paper provides the first systematic methodology for integrating manipulation risk into AI safety governance.
arXiv Detail & Related papers (2025-07-17T07:45:53Z)
Mitigating Cyber Risk in the Age of Open-Weight LLMs: Policy Gaps and Technical Realities [0.0]
Open-weight general-purpose AI (GPAI) models offer significant benefits but also introduce substantial cybersecurity risks.<n>This paper analyzes the specific threats -- including accelerated malware development and enhanced social engineering -- magnified by open-weight AI release.<n>We propose a path forward focusing on evaluating and controlling specific high-risk capabilities rather than entire models.
arXiv Detail & Related papers (2025-05-21T11:35:52Z)
A Proposal for Evaluating the Operational Risk for ChatBots based on Large Language Models [39.58317527488534]
We propose a novel, instrumented risk-assessment metric that simultaneously evaluates potential threats to three key stakeholders.<n>To validate our metric, we leverage Garak, an open-source framework for vulnerability testing.<n>Results underscore the importance of multi-dimensional risk assessments in operationalizing secure, reliable AI-driven conversational systems.
arXiv Detail & Related papers (2025-05-07T20:26:45Z)
OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities [0.0]
We demonstrate a new approach to assessing AI's progress towards enabling and scaling real-world offensive cyber operations.<n>We detail OCCULT, a lightweight operational evaluation framework that allows cyber security experts to contribute to rigorous and repeatable measurement.<n>We find that there has been significant recent advancement in the risks of AI being used to scale realistic cyber threats.
arXiv Detail & Related papers (2025-02-18T19:33:14Z)
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI [52.138044013005]
generative AI, particularly large language models (LLMs), become increasingly integrated into production applications. New attack surfaces and vulnerabilities emerge and put a focus on adversarial threats in natural language and multi-modal systems. Red-teaming has gained importance in proactively identifying weaknesses in these systems, while blue-teaming works to protect against such adversarial attacks. This work aims to bridge the gap between academic insights and practical security measures for the protection of generative AI systems.
arXiv Detail & Related papers (2024-09-23T10:18:10Z)
EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [53.717918131568936]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction.<n>Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results.<n>However, the deployment of these agents in physical environments presents significant safety challenges.<n>This study introduces EARBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.