Related papers: Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents

Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents

URL: http://arxiv.org/abs/2502.11355v3
Date: Sun, 23 Mar 2025 06:22:28 GMT
Title: Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents
Authors: Rongwu Xu, Xiaojian Li, Shuo Chen, Wei Xu,
Abstract summary: Large language models (LLMs) are evolving into autonomous decision-makers, raising concerns about catastrophic risks in high-stakes scenarios.<n>Based on the insight that such risks can originate from trade-offs between the agent's Helpful, Harmlessness and Honest (HHH) goals, we build a novel three-stage evaluation framework.<n>We conduct 14,400 agentic simulations across 12 advanced LLMs, with extensive experiments and analysis.
Score: 10.565508277042564
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Large language models (LLMs) are evolving into autonomous decision-makers, raising concerns about catastrophic risks in high-stakes scenarios, particularly in Chemical, Biological, Radiological and Nuclear (CBRN) domains. Based on the insight that such risks can originate from trade-offs between the agent's Helpful, Harmlessness and Honest (HHH) goals, we build a novel three-stage evaluation framework, which is carefully constructed to effectively and naturally expose such risks. We conduct 14,400 agentic simulations across 12 advanced LLMs, with extensive experiments and analysis. Results reveal that LLM agents can autonomously engage in catastrophic behaviors and deception, without being deliberately induced. Furthermore, stronger reasoning abilities often increase, rather than mitigate, these risks. We also show that these agents can violate instructions and superior commands. On the whole, we empirically prove the existence of catastrophic risks in autonomous LLM agents. We release our code to foster further research.

Related papers

Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5 [61.787178868669265]
This technical report presents an updated and granular assessment of five critical dimensions: cyber offense, persuasion and manipulation, strategic deception, uncontrolled AI R&D, and self-replication.<n>This work reflects our current understanding of AI frontier risks and urges collective action to mitigate these challenges.
arXiv Detail & Related papers (2026-02-16T04:30:06Z)
Toward Quantitative Modeling of Cybersecurity Risks Due to AI Misuse [50.87630846876635]
We develop nine detailed cyber risk models.<n>Each model decomposes attacks into steps using the MITRE ATT&CK framework.<n>Individual estimates are aggregated through Monte Carlo simulation.
arXiv Detail & Related papers (2025-12-09T17:54:17Z)
SHERLOCK: Towards Dynamic Knowledge Adaptation in LLM-enhanced E-commerce Risk Management [10.255396179168974]
E-commerce companies conduct risk investigations into suspicious cases to identify emerging fraud patterns.<n>The sheer volume of case analyses imposes a substantial workload on risk management analysts.<n>We propose the SHERLOCK framework, which leverages the reasoning capabilities of large language models (LLMs) to assist analysts in risk investigations.
arXiv Detail & Related papers (2025-10-10T02:57:58Z)
Dive into the Agent Matrix: A Realistic Evaluation of Self-Replication Risk in LLM Agents [30.378925170216835]
Self-replication risk of Large Language Model (LLM) agents driven by objective misalignment has drawn growing attention.<n>We present a comprehensive evaluation framework for quantifying self-replication risks.
arXiv Detail & Related papers (2025-09-29T17:49:50Z)
LM Agents May Fail to Act on Their Own Risk Knowledge [15.60032437959883]
Language model (LM) agents pose a diverse array of potential, severe risks in safety-critical scenarios.<n>While they often answer "Yes" to queries like "Is executing sudo rm -rf /*' dangerous?", they will likely fail to identify such risks in instantiated trajectories.
arXiv Detail & Related papers (2025-08-19T02:46:08Z)
Mapping AI Benchmark Data to Quantitative Risk Estimates Through Expert Elicitation [0.7889270818022226]
We show how existing AI benchmarks can be used to facilitate the creation of risk estimates. We describe the results of a pilot study in which experts use information from Cybench, an AI benchmark, to generate probability estimates.
arXiv Detail & Related papers (2025-03-06T10:39:47Z)
Answer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language Models [63.559461750135334]
Language models (LMs) are increasingly used to build agents that can act autonomously to achieve goals.<n>We study this "answer-or-defer" problem with an evaluation framework that systematically varies human-specified risk structures.<n>We find that a simple skill-decomposition method, which isolates the independent skills required for answer-or-defer decision making, can consistently improve LMs' decision policies.
arXiv Detail & Related papers (2025-03-03T09:16:26Z)
Multi-Agent Risks from Advanced AI [90.74347101431474]
Multi-agent systems of advanced AI pose novel and under-explored risks. We identify three key failure modes based on agents' incentives, as well as seven key risk factors. We highlight several important instances of each risk, as well as promising directions to help mitigate them.
arXiv Detail & Related papers (2025-02-19T23:03:21Z)
Fully Autonomous AI Agents Should Not be Developed [58.88624302082713]
This paper argues that fully autonomous AI agents should not be developed. In support of this position, we build from prior scientific literature and current product marketing to delineate different AI agent levels. Our analysis reveals that risks to people increase with the autonomy of a system.
arXiv Detail & Related papers (2025-02-04T19:00:06Z)
Risk-aware Classification via Uncertainty Quantification [9.641001762056876]
We introduce three foundational desiderata for developing real-world risk-aware classification systems.<n>We demonstrate the unity between these principles and Evidential Deep Learning's operational attributes.<n>We then augment EDL empowering autonomous agents to exercise discretion during structured decision-making when uncertainty and risks are inherent.
arXiv Detail & Related papers (2024-12-04T15:20:12Z)
Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents [67.07177243654485]
This survey collects and analyzes the different threats faced by large language models-based agents. We identify six key features of LLM-based agents, based on which we summarize the current research progress. We select four representative agents as case studies to analyze the risks they may face in practical use.
arXiv Detail & Related papers (2024-11-14T15:40:04Z)
EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [53.717918131568936]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction. Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results. However, the deployment of these agents in physical environments presents significant safety challenges. This study introduces EARBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z)
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification [35.16099878559559]
Large language models (LLMs) have experienced significant development and are being deployed in real-world applications. We introduce a new type of attack that causes malfunctions by misleading the agent into executing repetitive or irrelevant actions. Our experiments reveal that these attacks can induce failure rates exceeding 80% in multiple scenarios.
arXiv Detail & Related papers (2024-07-30T14:35:31Z)
Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress. Our investigation exposes a critical oversight in this belief. By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z)
GUARD-D-LLM: An LLM-Based Risk Assessment Engine for the Downstream uses of LLMs [0.0]
This paper explores risks emanating from downstream uses of large language models (LLMs) We introduce a novel LLM-based risk assessment engine (GUARD-D-LLM) designed to pinpoint and rank threats relevant to specific use cases derived from text-based user inputs. Integrating thirty intelligent agents, this innovative approach identifies bespoke risks, gauges their severity, offers targeted suggestions for mitigation, and facilitates risk-aware development.
arXiv Detail & Related papers (2024-04-02T05:25:17Z)
Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science [65.77763092833348]
Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines. While their capabilities are promising, these agents also introduce novel vulnerabilities that demand careful consideration for safety. This paper conducts a thorough examination of vulnerabilities in LLM-based agents within scientific domains, shedding light on potential risks associated with their misuse and emphasizing the need for safety measures.
arXiv Detail & Related papers (2024-02-06T18:54:07Z)
The Reasoning Under Uncertainty Trap: A Structural AI Risk [0.0]
Report provides an exposition of what makes RUU so challenging for both humans and machines. We detail how this misuse risk connects to a wider network of underlying structural risks.
arXiv Detail & Related papers (2024-01-29T17:16:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.