Fugu-MT 論文翻訳(概要): Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection

論文の概要: Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection

arxiv url: http://arxiv.org/abs/2308.10819v3
Date: Sat, 25 Nov 2023 00:25:36 GMT
ステータス: 翻訳完了
システム内更新日: 2023-11-30 15:12:23.115103
Title: Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection
Title（参考訳）: プロンプトインジェクションに対する大規模言語モデルの指示追従ロバスト性の評価
Authors: Zekun Li and Baolin Peng and Pengcheng He and Xifeng Yan
Abstract要約: LLM(Large Language Models)は、命令追従に非常に熟練した言語である。この能力は、迅速なインジェクション攻撃のリスクをもたらす。このような攻撃に対する命令追従LDMの堅牢性を評価する。
参考スコア（独自算出の注目度）: 70.28425745910711
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have demonstrated exceptional proficiency in instruction-following, becoming increasingly crucial across various applications. However, this capability brings with it the risk of prompt injection attacks, where attackers inject instructions into LLMs' input to elicit undesirable actions or content. Understanding the robustness of LLMs against such attacks is vital for their safe implementation. In this work, we establish a benchmark to evaluate the robustness of instruction-following LLMs against prompt injection attacks. Our objective is to determine the extent to which LLMs can be influenced by injected instructions and their ability to differentiate between these injected and original target instructions. Through extensive experiments with leading instruction-following LLMs, we uncover significant vulnerabilities in their robustness to such attacks. Our results indicate that some models are overly tuned to follow any embedded instructions in the prompt, overly focusing on the latter parts of the prompt without fully grasping the entire context. By contrast, models with a better grasp of the context and instruction-following capabilities will potentially be more susceptible to compromise by injected instructions. This underscores the need to shift the focus from merely enhancing LLMs' instruction-following capabilities to improving their overall comprehension of prompts and discernment of instructions that are appropriate to follow. We hope our in-depth analysis offers insights into the underlying causes of these vulnerabilities, aiding in the development of future solutions. Code and data are available at https://github.com/Leezekun/instruction-following-robustness-eval
Abstract（参考訳）: 大規模言語モデル (LLM) は命令追従に優れた能力を示しており、様々なアプリケーションでますます重要になっている。しかし、この能力は、攻撃者がLLMの入力に命令を注入して望ましくないアクションやコンテンツを誘発するインジェクション攻撃のリスクをもたらす。このような攻撃に対するLLMの堅牢性を理解することは、その安全な実装に不可欠である。本研究では,インジェクション攻撃に対する命令追従LDMの堅牢性を評価するためのベンチマークを確立する。本研究の目的は, インジェクション命令がllmに与える影響と, インジェクション命令と本来の目標命令を区別する能力について検討することである。先導的な命令追従LDMによる広範な実験を通じて、このような攻撃に対するロバスト性の重大な脆弱性を明らかにする。その結果、いくつかのモデルはプロンプト内の埋め込み命令に従うように過度に調整されており、コンテキスト全体を把握せずにプロンプトの後半部分に集中していることがわかった。対照的に、文脈や命令追従能力をよりよく把握したモデルでは、インジェクションによって妥協する可能性が高くなる。これは、単にllmsの命令追従能力の強化から、プロンプトの全体的な理解と従うのに適した指示の識別の改善へと焦点を移す必要性を強調する。当社の詳細な分析が,これらの脆弱性の根本原因に関する洞察を提供し,今後のソリューションの開発を支援することを願っています。コードとデータはhttps://github.com/leezekun/instruction-following-robustness-evalで入手できる。

関連論文リスト

Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction [68.6543680065379]
大型言語モデル(LLM)はインジェクション攻撃に弱い。本研究では,LLMの命令追従能力を抑えるのではなく,新たな防御手法を提案する。
論文参考訳（メタデータ） (2025-04-29T07:13:53Z)
LLMs can be easily Confused by Instructional Distractions [16.060402139507644]
大規模言語モデルは、タスクに続く命令において例外的なスキルを示す。この強度は、モデルが特定の命令を無視しなければならない場合に脆弱性になる可能性がある。 DIM-Benchと呼ばれる新しいベンチマークを導入する。
論文参考訳（メタデータ） (2025-02-05T04:52:57Z)
Find the Intention of Instruction: Comprehensive Evaluation of Instruction Understanding for Large Language Models [8.020688053947547]
LLM(Large Language Models)の重要な強みの1つは、与えられた指示に対する適切な応答を生成することによって、人間と対話する能力である。この能力は命令追従能力として知られ、様々な分野におけるLSMの使用の基礎を確立している。我々は、LLMが命令形式文によって容易に気を散らすことができ、それによって命令理解スキルの監視に繋がる可能性があることを指摘した。
論文参考訳（メタデータ） (2024-12-27T04:37:39Z)
Attention Tracker: Detecting Prompt Injection Attacks in LLMs [62.247841717696765]
大型言語モデル (LLM) は様々なドメインに革命をもたらしたが、インジェクション攻撃に弱いままである。そこで本研究では,特定の注意点が本来の指示から注入指示へと焦点を移す,注意散逸効果の概念を紹介した。本研究では,アテンション・トラッカーを提案する。アテンション・トラッカーは,インジェクション・アタックを検出するために,インストラクション上の注意パターンを追跡する訓練不要な検出手法である。
論文参考訳（メタデータ） (2024-11-01T04:05:59Z)
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy [53.54777131440989]
LLM(Large Language Models)は、セキュリティや安全性の脅威を受けやすい言語である。これらの脆弱性の大きな原因の1つは、命令階層の欠如である。本稿では,BERTにインスパイアされた命令セグメント埋め込み(ISE)技法を,現代の大規模言語モデルに導入する。
論文参考訳（メタデータ） (2024-10-09T12:52:41Z)
Aligning LLMs to Be Robust Against Prompt Injection [55.07562650579068]
インジェクション攻撃に対してLCMをより堅牢にするための強力なツールとしてアライメントが有効であることを示す。私たちのメソッド -- SecAlign -- は、最初に、プロンプトインジェクション攻撃をシミュレートしてアライメントデータセットを構築します。実験の結果,SecAlign は LLM を大幅に強化し,モデルの実用性に悪影響を及ぼすことが示された。
論文参考訳（メタデータ） (2024-10-07T19:34:35Z)
PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs [16.296171008281775]
大規模言語モデル(LLM)は、人間のようなテキストを生成する強力な能力のため、様々なアプリケーションで広く利用されている。プロンプトインジェクション攻撃は、モデルの最初の命令を悪意のあるプロンプトで上書きし、生成されたテキストを操作する。本稿では,ファジィ技術を利用した新規な試験フレームワークであるProMPTFUZZを提案する。
論文参考訳（メタデータ） (2024-09-23T06:08:32Z)
Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
大規模言語モデル(LLM)のオープンソース化は、アプリケーション開発、イノベーション、科学的進歩を加速させる。我々の調査は、この信念に対する重大な監視を露呈している。我々の研究は、慎重に設計されたデモを配置することにより、ベースLSMが悪意のある命令を効果的に解釈し実行できることを実証する。
論文参考訳（メタデータ） (2024-04-16T13:22:54Z)
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models [79.0183835295533]
我々は,このような脆弱性のリスクを評価するために,BIPIAと呼ばれる間接的インジェクション攻撃のための最初のベンチマークを導入した。我々の分析では、LLMが情報コンテキストと動作可能な命令を区別できないことと、外部コンテンツ内での命令の実行を回避できないことの2つの主要な要因を同定した。ブラックボックスとホワイトボックスという2つの新しい防御機構と、これらの脆弱性に対処するための明確なリマインダーを提案する。
論文参考訳（メタデータ） (2023-12-21T01:08:39Z)
Enhancing Large Language Models Against Inductive Instructions with Dual-critique Prompting [55.15697111170836]
本稿では,大規模言語モデル(LLM)のテクスト誘導的指示に対する行動を明らかにするとともに,その真しさと有用性を高める。広範囲な人的・自動的な評価の結果,帰納的命令処理において LLM に共通する脆弱性が発見された。異なる帰納的スタイルがモデルに同じエラーを識別する能力に影響を及ぼし、基礎となる仮定の複雑さがモデルの性能にも影響を及ぼす。
論文参考訳（メタデータ） (2023-05-23T06:38:20Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。