Related papers: Risks of AI Scientists: Prioritizing Safeguarding Over Autonomy

Risks of AI Scientists: Prioritizing Safeguarding Over Autonomy

URL: http://arxiv.org/abs/2402.04247v5
Date: Mon, 21 Jul 2025 18:59:21 GMT
Title: Risks of AI Scientists: Prioritizing Safeguarding Over Autonomy
Authors: Xiangru Tang, Qiao Jin, Kunlun Zhu, Tongxin Yuan, Yichi Zhang, Wangchunshu Zhou, Meng Qu, Yilun Zhao, Jian Tang, Zhuosheng Zhang, Arman Cohan, Zhiyong Lu, Mark Gerstein,
Abstract summary: This perspective examines vulnerabilities in AI scientists, shedding light on potential risks associated with their misuse.<n>We take into account user intent, the specific scientific domain, and their potential impact on the external environment.<n>We propose a triadic framework involving human regulation, agent alignment, and an understanding of environmental feedback.
Score: 65.77763092833348
License: http://creativecommons.org/licenses/by/4.0/
Abstract: AI scientists powered by large language models have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines. While their capabilities are promising, these agents also introduce novel vulnerabilities that require careful consideration for safety. However, there has been limited comprehensive exploration of these vulnerabilities. This perspective examines vulnerabilities in AI scientists, shedding light on potential risks associated with their misuse, and emphasizing the need for safety measures. We begin by providing an overview of the potential risks inherent to AI scientists, taking into account user intent, the specific scientific domain, and their potential impact on the external environment. Then, we explore the underlying causes of these vulnerabilities and provide a scoping review of the limited existing works. Based on our analysis, we propose a triadic framework involving human regulation, agent alignment, and an understanding of environmental feedback (agent regulation) to mitigate these identified risks. Furthermore, we highlight the limitations and challenges associated with safeguarding AI scientists and advocate for the development of improved models, robust benchmarks, and comprehensive regulations.

Related papers

Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5 [61.787178868669265]
This technical report presents an updated and granular assessment of five critical dimensions: cyber offense, persuasion and manipulation, strategic deception, uncontrolled AI R&D, and self-replication.<n>This work reflects our current understanding of AI frontier risks and urges collective action to mitigate these challenges.
arXiv Detail & Related papers (2026-02-16T04:30:06Z)
SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents [11.817130554581436]
We introduce textbfSafeScientist, an innovative AI scientist framework to enhance safety and ethical responsibility in AI-driven scientific exploration.<n>SafeScientist proactively refuses ethically inappropriate or high-risk tasks and rigorously emphasizes safety throughout the research process.<n>We propose textbfSciSafetyBench, a novel benchmark specifically designed to evaluate AI safety in scientific contexts.
arXiv Detail & Related papers (2025-05-29T15:35:58Z)
Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path? [37.13209023718946]
Unchecked AI agency poses significant risks to public safety and security.<n>We discuss how these risks arise from current AI training methods.<n>We propose a core building block for further advances the development of a non-agentic AI system.
arXiv Detail & Related papers (2025-02-21T18:28:36Z)
Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents [10.565508277042564]
Large language models (LLMs) are evolving into autonomous decision-makers, raising concerns about catastrophic risks in high-stakes scenarios. Based on the insight that such risks can originate from trade-offs between the agent's Helpful, Harmlessness and Honest (HHH) goals, we build a novel three-stage evaluation framework. We conduct 14,400 agentic simulations across 12 advanced LLMs, with extensive experiments and analysis.
arXiv Detail & Related papers (2025-02-17T02:11:17Z)
AI Safety for Everyone [3.440579243843689]
Recent discussions and research in AI safety have increasingly emphasized the deep connection between AI safety and existential risk from advanced AI systems.<n>This framing may exclude researchers and practitioners who are committed to AI safety but approach the field from different angles.<n>We find a vast array of concrete safety work that addresses immediate and practical concerns with current AI systems.
arXiv Detail & Related papers (2025-02-13T13:04:59Z)
Open Problems in Machine Unlearning for AI Safety [61.43515658834902]
Machine unlearning -- the ability to selectively forget or suppress specific types of knowledge -- has shown promise for privacy and data removal tasks.<n>In this paper, we identify key limitations that prevent unlearning from serving as a comprehensive solution for AI safety.
arXiv Detail & Related papers (2025-01-09T03:59:10Z)
Agent-SafetyBench: Evaluating the Safety of LLM Agents [72.92604341646691]
We introduce Agent-SafetyBench, a comprehensive benchmark to evaluate the safety of large language models (LLMs) Agent-SafetyBench encompasses 349 interaction environments and 2,000 test cases, evaluating 8 categories of safety risks and covering 10 common failure modes frequently encountered in unsafe interactions. Our evaluation of 16 popular LLM agents reveals a concerning result: none of the agents achieves a safety score above 60%.
arXiv Detail & Related papers (2024-12-19T02:35:15Z)
Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents [67.07177243654485]
This survey collects and analyzes the different threats faced by large language models-based agents. We identify six key features of LLM-based agents, based on which we summarize the current research progress. We select four representative agents as case studies to analyze the risks they may face in practical use.
arXiv Detail & Related papers (2024-11-14T15:40:04Z)
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs [80.45174785447136]
Laboratory accidents pose significant risks to human life and property. Despite advancements in safety training, laboratory personnel may still unknowingly engage in unsafe practices. There is a growing concern about large language models (LLMs) for guidance in various fields.
arXiv Detail & Related papers (2024-10-18T05:21:05Z)
SciSafeEval: A Comprehensive Benchmark for Safety Alignment of Large Language Models in Scientific Tasks [36.99233361224705]
Large language models (LLMs) have had a transformative impact on a variety of scientific tasks across disciplines such as biology, chemistry, medicine, and physics. Existing benchmarks primarily focus on textual content and overlooking key scientific representations such as molecular, protein, and genomic languages. We introduce SciSafeEval, a benchmark designed to evaluate the safety alignment of LLMs across a range of scientific tasks.
arXiv Detail & Related papers (2024-10-02T16:34:48Z)
EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [53.717918131568936]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction. Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results. However, the deployment of these agents in physical environments presents significant safety challenges. This study introduces EARBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z)
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? [59.96471873997733]
We propose an empirical foundation for developing more meaningful safety metrics and define AI safety in a machine learning research context.<n>We aim to provide a more rigorous framework for AI safety research, advancing the science of safety evaluations and clarifying the path towards measurable progress.
arXiv Detail & Related papers (2024-07-31T17:59:24Z)
InferAct: Inferring Safe Actions for LLM-Based Agents Through Preemptive Evaluation and Human Feedback [70.54226917774933]
This paper introduces InferAct, a novel approach to proactively detect potential errors before risky actions are executed. InferAct acts as a human proxy, detecting unsafe actions and alerting users for intervention. Experiments on three widely-used tasks demonstrate the effectiveness of InferAct.
arXiv Detail & Related papers (2024-07-16T15:24:44Z)
Current state of LLM Risks and AI Guardrails [0.0]
Large language models (LLMs) have become increasingly sophisticated, leading to widespread deployment in sensitive applications where safety and reliability are paramount. These risks necessitate the development of "guardrails" to align LLMs with desired behaviors and mitigate potential harm. This work explores the risks associated with deploying LLMs and evaluates current approaches to implementing guardrails and model alignment techniques.
arXiv Detail & Related papers (2024-06-16T22:04:10Z)
Prioritizing High-Consequence Biological Capabilities in Evaluations of Artificial Intelligence Models [0.0]
We argue that AI evaluations model should prioritize addressing high-consequence risks. These risks could cause large-scale harm to the public, such as pandemics. Scientists' experience with identifying and mitigating dual-use biological risks can help inform new approaches to evaluating biological AI models.
arXiv Detail & Related papers (2024-05-25T16:29:17Z)
Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress. Our investigation exposes a critical oversight in this belief. By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z)
GUARD-D-LLM: An LLM-Based Risk Assessment Engine for the Downstream uses of LLMs [0.0]
This paper explores risks emanating from downstream uses of large language models (LLMs) We introduce a novel LLM-based risk assessment engine (GUARD-D-LLM) designed to pinpoint and rank threats relevant to specific use cases derived from text-based user inputs. Integrating thirty intelligent agents, this innovative approach identifies bespoke risks, gauges their severity, offers targeted suggestions for mitigation, and facilitates risk-aware development.
arXiv Detail & Related papers (2024-04-02T05:25:17Z)
A Safe Harbor for AI Evaluation and Red Teaming [124.89885800509505]
Some researchers fear that conducting such research or releasing their findings will result in account suspensions or legal reprisal. We propose that major AI developers commit to providing a legal and technical safe harbor. We believe these commitments are a necessary step towards more inclusive and unimpeded community efforts to tackle the risks of generative AI.
arXiv Detail & Related papers (2024-03-07T20:55:08Z)
Highlighting the Safety Concerns of Deploying LLMs/VLMs in Robotics [54.57914943017522]
We highlight the critical issues of robustness and safety associated with integrating large language models (LLMs) and vision-language models (VLMs) into robotics applications.
arXiv Detail & Related papers (2024-02-15T22:01:45Z)
Control Risk for Potential Misuse of Artificial Intelligence in Science [85.91232985405554]
We aim to raise awareness of the dangers of AI misuse in science. We highlight real-world examples of misuse in chemical science. We propose a system called SciGuard to control misuse risks for AI models in science.
arXiv Detail & Related papers (2023-12-11T18:50:57Z)
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities [14.684194175806203]
Large language models (LLMs) can be misused for fraud, impersonation, and the generation of malware. We present a taxonomy describing the relationship between threats caused by the generative capabilities of LLMs, prevention measures intended to address such threats, and vulnerabilities arising from imperfect prevention measures.
arXiv Detail & Related papers (2023-08-24T14:45:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.