Fugu-MT 論文翻訳(概要): Toward Reliable, Safe, and Secure LLMs for Scientific Applications

論文の概要: Toward Reliable, Safe, and Secure LLMs for Scientific Applications

arxiv url: http://arxiv.org/abs/2603.18235v1
Date: Wed, 18 Mar 2026 19:43:38 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-20 17:19:05.829778
Title: Toward Reliable, Safe, and Secure LLMs for Scientific Applications
Title（参考訳）: 科学的応用のための信頼性, 安全性, 安全性の両立に向けて
Authors: Saket Sanjeev Chaturvedi, Joshua Bergerson, Tanwi Mallick,
Abstract要約: 大規模言語モデル(LLM)は自律的な「AI科学者」へと進化する科学的に信頼できるデプロイメントを保証するには、信頼性、安全性、セキュリティを中心とした新しいパラダイムが必要です。本稿では,科学におけるLLMエージェントのユニークなセキュリティと安全性の展望について検討する。
参考スコア（独自算出の注目度）: 3.114973891723128
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As large language models (LLMs) evolve into autonomous "AI scientists," they promise transformative advances but introduce novel vulnerabilities, from potential "biosafety risks" to "dangerous explosions." Ensuring trustworthy deployment in science requires a new paradigm centered on reliability (ensuring factual accuracy and reproducibility), safety (preventing unintentional physical or biological harm), and security (preventing malicious misuse). Existing general-purpose safety benchmarks are poorly suited for this purpose, suffering from a fundamental domain mismatch, limited threat coverage of science-specific vectors, and benchmark overfitting, which create a critical gap in vulnerability evaluation for scientific applications. This paper examines the unique security and safety landscape of LLM agents in science. We begin by synthesizing a detailed taxonomy of LLM threats contextualized for scientific research, to better understand the unique risks associated with LLMs in science. Next, we conceptualize a mechanism to address the evaluation gap by utilizing dedicated multi-agent systems for the automated generation of domain-specific adversarial security benchmarks. Based on our analysis, we outline how existing safety methods can be brought together and integrated into a conceptual multilayered defense framework designed to combine a red-teaming exercise and external boundary controls with a proactive internal Safety LLM Agent. Together, these conceptual elements provide a necessary structure for defining, evaluating, and creating comprehensive defense strategies for trustworthy LLM agent deployment in scientific disciplines.
Abstract（参考訳）: 大規模言語モデル(LLM)が自律的な「AI科学者」へと進化するにつれて、彼らは変革的な進歩を約束するが、潜在的な「バイオセーフティリスク」から「危険な爆発」まで、新たな脆弱性を導入する。科学的に信頼できるデプロイメントを保証するためには、信頼性(事実の正確性と再現性を保証する)、安全性(意図しない物理的または生物学的な害を防ぐ)、セキュリティ(悪意のある誤用を防ぐ)に焦点を当てた新しいパラダイムが必要である。既存の汎用安全ベンチマークは、基本的なドメインミスマッチ、科学固有のベクトルの脅威範囲の制限、およびベンチマークオーバーフィッティングに苦しむため、この目的には適していない。本稿では,科学におけるLLMエージェントのユニークなセキュリティと安全性の展望について検討する。まず、科学的研究のために文脈化されたLLM脅威の詳細な分類を合成し、科学におけるLLMにまつわるユニークなリスクをよりよく理解することから始める。次に、ドメイン固有の敵セキュリティベンチマークの自動生成のための専用マルチエージェントシステムを利用することで、評価ギャップに対処するメカニズムを概念化する。我々は,既存の安全対策を,レッドチーム演習と外部境界制御と積極的内部安全LLMエージェントを組み合わせた概念的多層防衛フレームワークに統合する方法について概説した。これらの概念的要素は、科学的分野における信頼できるLLMエージェントの配備のための総合的な防衛戦略を定義し、評価し、作成するために必要な構造を提供する。

論文の概要: Toward Reliable, Safe, and Secure LLMs for Scientific Applications

関連論文リスト