Fugu-MT 論文翻訳(概要): Enhancing Security in LLM Applications: A Performance Evaluation of Early Detection Systems

論文の概要: Enhancing Security in LLM Applications: A Performance Evaluation of Early Detection Systems

arxiv url: http://arxiv.org/abs/2506.19109v1
Date: Mon, 23 Jun 2025 20:39:43 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-25 19:48:23.379512
Title: Enhancing Security in LLM Applications: A Performance Evaluation of Early Detection Systems
Title（参考訳）: LLMアプリケーションのセキュリティ向上:早期検知システムの性能評価
Authors: Valerii Gakh, Hayretdin Bahsi,
Abstract要約: 迅速なインジェクション攻撃では、攻撃者はシステム命令を悪意を持って操作し、システムの機密性を侵害する。本研究では,早期インジェクション検出システムの性能について検討し,様々なオープンソースソリューションで実装された技術の検出性能に着目した。本研究は, 突発的漏洩検出手法の異なる解析方法と, それらの手法を実装した複数の検出方法の比較分析を行った。
参考スコア（独自算出の注目度）: 1.03590082373586
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Prompt injection threatens novel applications that emerge from adapting LLMs for various user tasks. The newly developed LLM-based software applications become more ubiquitous and diverse. However, the threat of prompt injection attacks undermines the security of these systems as the mitigation and defenses against them, proposed so far, are insufficient. We investigated the capabilities of early prompt injection detection systems, focusing specifically on the detection performance of techniques implemented in various open-source solutions. These solutions are supposed to detect certain types of prompt injection attacks, including the prompt leak. In prompt leakage attacks, an attacker maliciously manipulates the LLM into outputting its system instructions, violating the system's confidentiality. Our study presents analyzes of distinct prompt leakage detection techniques, and a comparative analysis of several detection solutions, which implement those techniques. We identify the strengths and weaknesses of these techniques and elaborate on their optimal configuration and usage in high-stake deployments. In one of the first studies on existing prompt leak detection solutions, we compared the performances of LLM Guard, Vigil, and Rebuff. We concluded that the implementations of canary word checks in Vigil and Rebuff were not effective at detecting prompt leak attacks, and we proposed improvements for them. We also found an evasion weakness in Rebuff's secondary model-based technique and proposed a mitigation. Then, the result of the comparison of LLM Guard, Vigil, and Rebuff at their peak performance revealed that Vigil is optimal for cases when minimal false positive rate is required, and Rebuff is the most optimal for average needs.
Abstract（参考訳）: プロンプトインジェクションは、様々なユーザタスクにLLMを適用することから現れる新しいアプリケーションを脅かす。新たに開発されたLLMベースのソフトウェアアプリケーションは、よりユビキタスで多様なものになる。しかし, 即時投射攻撃の脅威は, これまでの提案した対策・防御が不十分であるため, これらのシステムの安全性を損なうものとなる。本研究では,早期インジェクション検出システムの性能について検討し,様々なオープンソースソリューションで実装された技術の検出性能に着目した。これらのソリューションは、プロンプトリークを含むある種のプロンプトインジェクション攻撃を検出することを目的としている。迅速な漏洩攻撃では、攻撃者はLLMを悪意を持って操作してシステム命令を出力し、システムの機密性を侵害する。本研究は, 突発的漏洩検出手法の異なる解析方法と, それらの手法を実装した複数の検出方法の比較分析を行った。これらのテクニックの長所と短所を特定し、その最適構成と高レベルのデプロイメントにおける使用法について詳しく検討する。 LLMガード, Vigil および Rebuff の性能の比較を行った。我々は,Vigil と Rebuff におけるカナリア語チェックの実装は,迅速な漏洩攻撃の検出には有効ではないと結論し,それらの改善を提案した。また,Rebuffの二次モデルに基づく手法では回避の弱点がみられ,緩和法が提案された。そして,LLMガード,ビジル,レバフのピーク時の比較結果から,最小偽陽性率が要求される場合にビジルが最適であり,レバフが平均的ニーズに対して最も最適であることが明らかとなった。

関連論文リスト

System Prompt Extraction Attacks and Defenses in Large Language Models [2.6986500640871482]
大規模言語モデル(LLM)におけるシステムプロンプトは、モデルの振る舞いと応答生成を導く上で重要な役割を果たす。近年の研究では、LLMシステムプロンプトは、厳密に設計されたクエリによる攻撃の抽出に非常に敏感であることが示されている。脅威が増大しているにもかかわらず、システムによる攻撃と防御の促進に関する体系的な研究が欠如している。
論文参考訳（メタデータ） (2025-05-27T21:36:27Z)
DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks [101.52204404377039]
LLM統合されたアプリケーションとエージェントは、インジェクション攻撃に弱い。検出方法は、入力が注入プロンプトによって汚染されているかどうかを判定することを目的とする。本研究では,迅速なインジェクション攻撃を検出するゲーム理論手法であるDataSentinelを提案する。
論文参考訳（メタデータ） (2025-04-15T16:26:21Z)
MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents [60.30753230776882]
LLMエージェントは間接的プロンプトインジェクション(IPI)攻撃に対して脆弱であり、ツール検索情報に埋め込まれた悪意のあるタスクはエージェントをリダイレクトして不正なアクションを取ることができる。マスク機能によって修正されたマスク付きユーザでエージェントの軌道を再実行することで攻撃を検知する新しいIPIディフェンスであるMELONを提案する。
論文参考訳（メタデータ） (2025-02-07T18:57:49Z)
Joint Optimization of Prompt Security and System Performance in Edge-Cloud LLM Systems [15.058369477125893]
大規模言語モデル(LLM)は人間の生活を著しく促進し、迅速なエンジニアリングによりこれらのモデルの効率が向上した。近年、エンジニアリングを駆使した攻撃が急速に増加し、プライバシーの漏洩、レイテンシの増大、システムリソースの浪費といった問題が発生している。我々は,エッジクラウド LLM (EC-LLM) システムにおけるセキュリティ,サービスレイテンシ,システムリソースの最適化を,様々な攻撃の下で共同で検討する。
論文参考訳（メタデータ） (2025-01-30T14:33:49Z)
PrivAgent: Agentic-based Red-teaming for LLM Privacy Leakage [78.33839735526769]
LLMは、慎重に構築された敵のプロンプトの下で私的情報を出力することに騙される可能性がある。 PrivAgentは、プライバシー漏洩のための新しいブラックボックスレッドチームフレームワークである。
論文参考訳（メタデータ） (2024-12-07T20:09:01Z)
Attention Tracker: Detecting Prompt Injection Attacks in LLMs [62.247841717696765]
大型言語モデル (LLM) は様々なドメインに革命をもたらしたが、インジェクション攻撃に弱いままである。そこで本研究では,特定の注意点が本来の指示から注入指示へと焦点を移す,注意散逸効果の概念を紹介した。本研究では,アテンション・トラッカーを提案する。アテンション・トラッカーは,インジェクション・アタックを検出するために,インストラクション上の注意パターンを追跡する訓練不要な検出手法である。
論文参考訳（メタデータ） (2024-11-01T04:05:59Z)
Palisade -- Prompt Injection Detection Framework [0.9620910657090188]
大規模言語モデルは、悪意のあるインジェクション攻撃に対して脆弱である。本稿では,新しいNLPを用いたインジェクション検出手法を提案する。階層化された入力スクリーニングプロセスを通じて精度と最適化を強調する。
論文参考訳（メタデータ） (2024-10-28T15:47:03Z)
Optimization-based Prompt Injection Attack to LLM-as-a-Judge [78.20257854455562]
LLM-as-a-Judgeは、大きな言語モデル(LLM)を使用して、ある質問に対する候補セットから最適な応答を選択する。 LLM-as-a-Judgeに対する最適化に基づくプロンプトインジェクション攻撃であるJiceDeceiverを提案する。評価の結果,JiceDeceiveは既存のプロンプトインジェクション攻撃よりも効果的であることがわかった。
論文参考訳（メタデータ） (2024-03-26T13:58:00Z)
Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information [67.78183175605761]
大規模言語モデルは、敵の迅速な攻撃に影響を受けやすい。この脆弱性は、LLMの堅牢性と信頼性に関する重要な懸念を浮き彫りにしている。トークンレベルで敵のプロンプトを検出するための新しい手法を提案する。
論文参考訳（メタデータ） (2023-11-20T03:17:21Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。