Fugu-MT 論文翻訳(概要): Through the Stealth Lens: Rethinking Attacks and Defenses in RAG

論文の概要: Through the Stealth Lens: Rethinking Attacks and Defenses in RAG

arxiv url: http://arxiv.org/abs/2506.04390v1
Date: Wed, 04 Jun 2025 19:15:09 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-06 21:53:49.392514
Title: Through the Stealth Lens: Rethinking Attacks and Defenses in RAG
Title（参考訳）: ステルスレンズを通して:RAGの攻撃と防御を再考する
Authors: Sarthak Choudhary, Nils Palumbo, Ashish Hooda, Krishnamurthy Dj Dvijotham, Somesh Jha,
Abstract要約: RevalVariRAGシステムは, 汚職率の低い場合でも, 有害な侵入に対して脆弱であることを示す。我々は、低レートでも攻撃が信頼できるように設計されていないことを示し、検出と緩和を可能にしている。
参考スコア（独自算出の注目度）: 21.420202472493425
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Retrieval-augmented generation (RAG) systems are vulnerable to attacks that inject poisoned passages into the retrieved set, even at low corruption rates. We show that existing attacks are not designed to be stealthy, allowing reliable detection and mitigation. We formalize stealth using a distinguishability-based security game. If a few poisoned passages are designed to control the response, they must differentiate themselves from benign ones, inherently compromising stealth. This motivates the need for attackers to rigorously analyze intermediate signals involved in generation$\unicode{x2014}$such as attention patterns or next-token probability distributions$\unicode{x2014}$to avoid easily detectable traces of manipulation. Leveraging attention patterns, we propose a passage-level score$\unicode{x2014}$the Normalized Passage Attention Score$\unicode{x2014}$used by our Attention-Variance Filter algorithm to identify and filter potentially poisoned passages. This method mitigates existing attacks, improving accuracy by up to $\sim 20 \%$ over baseline defenses. To probe the limits of attention-based defenses, we craft stealthier adaptive attacks that obscure such traces, achieving up to $35 \%$ attack success rate, and highlight the challenges in improving stealth.
Abstract（参考訳）: 検索増強世代(RAG)システムは、低汚職率でも、回収されたセットに毒素を注入する攻撃に対して脆弱である。既存の攻撃はステルス性を持たず、信頼性の高い検出と緩和を可能にする。我々は識別可能性に基づくセキュリティゲームを用いてステルスを定式化する。毒を盛ったいくつかの通路が反応を制御するように設計されている場合は、本来は盗みを妥協させる良心的な通路と区別しなければならない。これは、攻撃者が生成に関わる中間信号を厳格に分析する必要性を動機付けている。$\unicode{x2014}$ such as attention pattern or next-token probability distributions$\unicode{x2014}$to avoid easy detectable traces of operation。注意パターンを活用することで,有毒な経路を識別・フィルタリングするために,経路レベルのスコア$\unicode{x2014}$the Normalized Passage Attention Score$\unicode{x2014}$used by our Attention-Variance Filter algorithm to identify and filter that potential poisoned passages。この方法では、既存の攻撃を軽減し、ベースライン防御よりも最大$\sim 20 \%の精度を向上させる。注意に基づく防御の限界を調査するために、そのような痕跡を隠蔽し、最大35ドル(約3,500円)の攻撃成功率を達成し、ステルスを改善する上での課題を浮き彫りにします。

関連論文リスト

When Safety Detectors Aren't Enough: A Stealthy and Effective Jailbreak Attack on LLMs via Steganographic Techniques [5.2431999629987]
大規模言語モデル(LLM)に脱獄攻撃が深刻な脅威本稿では,新しいステルスの観点からのジェイルブレイク手法の体系的調査について述べる。我々はステガノグラフィーを用いて、良質でセマンティックに一貫性のあるテキスト内に有害なクエリを隠蔽するステゴアタック(StegoAttack)を提案する。
論文参考訳（メタデータ） (2025-05-22T15:07:34Z)
Mind the Gap: Detecting Black-box Adversarial Attacks in the Making through Query Update Analysis [3.795071937009966]
アドリアックは機械学習(ML)モデルの整合性を損なう可能性がある。本稿では,逆ノイズインスタンスが生成されているかどうかを検出するフレームワークを提案する。適応攻撃を含む8つの最先端攻撃に対するアプローチを評価する。
論文参考訳（メタデータ） (2025-03-04T20:25:12Z)
Poisoned Forgery Face: Towards Backdoor Attacks on Face Forgery Detection [62.595450266262645]
本稿では,バックドア攻撃による顔偽造検出の新たな脅威について紹介する。バックドアをモデルに埋め込むことで、攻撃者は検知器を騙して偽造された顔の誤予測を発生させることができる。我々は,顔偽造検知器に対するクリーンラベルバックドア攻撃を可能にするemphPoisoned Forgery Faceフレームワークを提案する。
論文参考訳（メタデータ） (2024-02-18T06:31:05Z)
Towards Stealthy Backdoor Attacks against Speech Recognition via Elements of Sound [9.24846124692153]
ディープニューラルネットワーク(Deep Neural Network, DNN)は、音声認識の様々な応用に広く採用され、導入されている。本稿では,音声認識に対する毒のみのバックドア攻撃について再検討する。我々は音(例えば、ピッチと音色)の要素を利用して、よりステルスで効果的な毒のみのバックドア攻撃を設計する。
論文参考訳（メタデータ） (2023-07-17T02:58:25Z)
Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
ディープニューラルネットワーク(DNN)は、バックドア攻撃に対して脆弱である。バックドアアタックは、訓練段階の脅威を脅かしている。軽度で目に見えないバックドアアタック(SIBA)を提案する。
論文参考訳（メタデータ） (2023-05-11T10:05:57Z)
Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks [76.35478518372692]
エプシロン・イリューソリー(epsilon-illusory)は、シーケンシャルな意思決定者に対する敵対的攻撃の新たな形態である。既存の攻撃と比較して,エプシロン・イリューソリーの自動検出は極めて困難である。以上の結果から, より優れた異常検知器, 効果的なハードウェアおよびシステムレベルの防御の必要性が示唆された。
論文参考訳（メタデータ） (2022-07-20T19:49:09Z)
Zero-Query Transfer Attacks on Context-Aware Object Detectors [95.18656036716972]
敵は、ディープニューラルネットワークが誤った分類結果を生成するような摂動画像を攻撃する。自然の多目的シーンに対する敵対的攻撃を防御するための有望なアプローチは、文脈整合性チェックを課すことである。本稿では,コンテキスト整合性チェックを回避可能な,コンテキスト整合性攻撃を生成するための最初のアプローチを提案する。
論文参考訳（メタデータ） (2022-03-29T04:33:06Z)
Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
Emphbackdoor攻撃は、深層ニューラルネットワーク(DNN)に隠れたバックドアを埋め込み、トレーニングデータに毒を盛ることを目的としている。我々は,対象ラベルを画像レベルではなくオブジェクトレベルから扱う,新たな攻撃パラダイムであるemphfine-fine-grained attackを提案する。実験により、提案手法はわずかなトレーニングデータだけを毒殺することでセマンティックセグメンテーションモデルを攻撃することに成功した。
論文参考訳（メタデータ） (2021-03-06T05:50:29Z)
RayS: A Ray Searching Method for Hard-label Adversarial Attack [99.72117609513589]
我々は、レイサーチ攻撃(RayS)を提案し、これはハードラベル攻撃の有効性と効率を大幅に改善する。モデルの正当性チェックとしても使用できる。
論文参考訳（メタデータ） (2020-06-23T07:01:50Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。