SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents
- URL: http://arxiv.org/abs/2505.23559v1
- Date: Thu, 29 May 2025 15:35:58 GMT
- Title: SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents
- Authors: Kunlun Zhu, Jiaxun Zhang, Ziheng Qi, Nuoxing Shang, Zijia Liu, Peixuan Han, Yue Su, Haofei Yu, Jiaxuan You,
- Abstract summary: We introduce textbfSafeScientist, an innovative AI scientist framework to enhance safety and ethical responsibility in AI-driven scientific exploration.<n>SafeScientist proactively refuses ethically inappropriate or high-risk tasks and rigorously emphasizes safety throughout the research process.<n>We propose textbfSciSafetyBench, a novel benchmark specifically designed to evaluate AI safety in scientific contexts.
- Score: 11.817130554581436
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in large language model (LLM) agents have significantly accelerated scientific discovery automation, yet concurrently raised critical ethical and safety concerns. To systematically address these challenges, we introduce \textbf{SafeScientist}, an innovative AI scientist framework explicitly designed to enhance safety and ethical responsibility in AI-driven scientific exploration. SafeScientist proactively refuses ethically inappropriate or high-risk tasks and rigorously emphasizes safety throughout the research process. To achieve comprehensive safety oversight, we integrate multiple defensive mechanisms, including prompt monitoring, agent-collaboration monitoring, tool-use monitoring, and an ethical reviewer component. Complementing SafeScientist, we propose \textbf{SciSafetyBench}, a novel benchmark specifically designed to evaluate AI safety in scientific contexts, comprising 240 high-risk scientific tasks across 6 domains, alongside 30 specially designed scientific tools and 120 tool-related risk tasks. Extensive experiments demonstrate that SafeScientist significantly improves safety performance by 35\% compared to traditional AI scientist frameworks, without compromising scientific output quality. Additionally, we rigorously validate the robustness of our safety pipeline against diverse adversarial attack methods, further confirming the effectiveness of our integrated approach. The code and data will be available at https://github.com/ulab-uiuc/SafeScientist. \textcolor{red}{Warning: this paper contains example data that may be offensive or harmful.}
Related papers
- Report on NSF Workshop on Science of Safe AI [75.96202715567088]
New advances in machine learning are leading to new opportunities to develop technology-based solutions to societal problems.<n>To fulfill the promise of AI, we must address how to develop AI-based systems that are accurate and performant but also safe and trustworthy.<n>This report is the result of the discussions in the working groups that addressed different aspects of safety at the workshop.
arXiv Detail & Related papers (2025-06-24T18:55:29Z) - SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge [11.63268709958876]
SOSBench is a regulation-grounded, hazard-focused benchmark for large language models.<n>It covers six high-risk scientific domains: chemistry, biology, medicine, pharmacology, physics, and psychology.<n>We evaluate frontier models within a unified evaluation framework using our SOSBench.
arXiv Detail & Related papers (2025-05-27T17:47:08Z) - LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs [78.99703366417661]
Large language models (LLMs) increasingly assist in tasks ranging from procedural guidance to autonomous experiment orchestration.<n>Such overreliance is particularly dangerous in high-stakes laboratory settings, where failures in hazard identification or risk assessment can result in severe accidents.<n>We propose the Laboratory Safety Benchmark (LabSafety Bench) to evaluate models on their ability to identify potential hazards, assess risks, and predict the consequences of unsafe actions in lab environments.
arXiv Detail & Related papers (2024-10-18T05:21:05Z) - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? [59.96471873997733]
We propose an empirical foundation for developing more meaningful safety metrics and define AI safety in a machine learning research context.<n>We aim to provide a more rigorous framework for AI safety research, advancing the science of safety evaluations and clarifying the path towards measurable progress.
arXiv Detail & Related papers (2024-07-31T17:59:24Z) - Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science [65.77763092833348]
Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines.
While their capabilities are promising, these agents also introduce novel vulnerabilities that demand careful consideration for safety.
This paper conducts a thorough examination of vulnerabilities in LLM-based agents within scientific domains, shedding light on potential risks associated with their misuse and emphasizing the need for safety measures.
arXiv Detail & Related papers (2024-02-06T18:54:07Z) - Control Risk for Potential Misuse of Artificial Intelligence in Science [85.91232985405554]
We aim to raise awareness of the dangers of AI misuse in science.
We highlight real-world examples of misuse in chemical science.
We propose a system called SciGuard to control misuse risks for AI models in science.
arXiv Detail & Related papers (2023-12-11T18:50:57Z) - OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning
Research [3.0536277689386453]
We introduce a foundational framework designed to expedite SafeRL research endeavors.
Our framework encompasses an array of algorithms spanning different RL domains and places heavy emphasis on safety elements.
arXiv Detail & Related papers (2023-05-16T09:22:14Z) - Towards Safer Generative Language Models: A Survey on Safety Risks,
Evaluations, and Improvements [76.80453043969209]
This survey presents a framework for safety research pertaining to large models.
We begin by introducing safety issues of wide concern, then delve into safety evaluation methods for large models.
We explore the strategies for enhancing large model safety from training to deployment.
arXiv Detail & Related papers (2023-02-18T09:32:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.