Fugu-MT 論文翻訳(概要): Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation

論文の概要: Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation

arxiv url: http://arxiv.org/abs/2604.18663v1
Date: Mon, 20 Apr 2026 12:33:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-22 22:41:49.393616
Title: Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation
Title（参考訳）: 厳格な拒絶を乗り越える - 検索強化世代に対するソフトフェイル攻撃
Authors: Wentao Zhang, Yan Zhuang, ZhuHang Zheng, Mingfei Zhang, Jiawen Deng, Fuji Ren,
Abstract要約: 既存のRAG(Retrieval-Augmented Generation)システムに対する妨害攻撃は、明示的な拒絶やサービス拒否行動を引き起こす。我々は,このようなソフト障害を引き起こすために,敵対的文書を生成する自動ブラックボックス攻撃フレームワークであるDeceptive Evolutionary Jamming Attack (DEJA)を提案する。実験によると、DJAは低実用性ソフト障害に対する応答を一貫して推進し、SASRを79%以上達成し、ハードフェイル率を15%以下に維持している。
参考スコア（独自算出の注目度）: 25.27360087818357
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing jamming attacks on Retrieval-Augmented Generation (RAG) systems typically induce explicit refusals or denial-of-service behaviors, which are conspicuous and easy to detect. In this work, we formalize a subtler availability threat, termed soft failure, which degrades system utility by inducing fluent and coherent yet non-informative responses rather than overt failures. We propose Deceptive Evolutionary Jamming Attack (DEJA), an automated black-box attack framework that generates adversarial documents to trigger such soft failures by exploiting safety-aligned behaviors of large language models. DEJA employs an evolutionary optimization process guided by a fine-grained Answer Utility Score (AUS), computed via an LLM-based evaluator, to systematically degrade the certainty of answers while maintaining high retrieval success. Extensive experiments across multiple RAG configurations and benchmark datasets show that DEJA consistently drives responses toward low-utility soft failures, achieving SASR above 79\% while keeping hard-failure rates below 15\%, significantly outperforming prior attacks. The resulting adversarial documents exhibit high stealth, evading perplexity-based detection and resisting query paraphrasing, and transfer across model families to proprietary systems without retargeting.
Abstract（参考訳）: 既存のRAG(Retrieval-Augmented Generation)システムに対するジャミング攻撃は、典型的には明らかな拒絶やサービス拒否を誘発する。本研究では,過度な障害ではなく,流動的で一貫性のない非形式的な応答を誘導することにより,システムの実用性を低下させるソフト障害という,より微妙な可用性の脅威を定式化する。本稿では,大規模言語モデルの安全性に配慮した行動を利用して,このようなソフト障害を発生させるブラックボックス自動攻撃フレームワークであるDeceptive Evolutionary Jamming Attack (DEJA)を提案する。 DEJAは細粒度アンサーユーティリティスコア(AUS: Answer Utility Score)によって導かれる進化的最適化プロセスを採用し、高い検索成功を維持しながら答えの確実性を体系的に劣化させる。複数のRAG構成とベンチマークデータセットにわたる大規模な実験は、DJAが一貫して低ユーティリティなソフト障害への応答を駆動し、SASRを79パーセント以上達成し、ハード障害率を15パーセント以下に抑え、前回の攻撃よりも大幅に向上していることを示している。結果として得られた敵の文書は、高いステルスを示し、複雑度に基づく検出を回避し、クエリのパラフレーズに抵抗し、モデルファミリを越えて、再ターゲティングなしでプロプライエタリなシステムに転送する。

論文の概要: Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation

関連論文リスト