Fugu-MT 論文翻訳(概要): AIP: Subverting Retrieval-Augmented Generation via Adversarial Instructional Prompt

論文の概要: AIP: Subverting Retrieval-Augmented Generation via Adversarial Instructional Prompt

arxiv url: http://arxiv.org/abs/2509.15159v1
Date: Thu, 18 Sep 2025 17:06:53 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-19 17:26:53.357769
Title: AIP: Subverting Retrieval-Augmented Generation via Adversarial Instructional Prompt
Title（参考訳）: AIP: 対向的指導プロンプトによる検索拡張生成の変換
Authors: Saket S. Chaturvedi, Gaurav Bagwe, Lan Zhang, Xiaoyong Yuan,
Abstract要約: 本稿では,RAG出力を操作するために,対向的命令プロンプトを利用した新たな攻撃法を提案する。 AIPは、システムの整合性を損なうために、いかに信頼されているように見えるインターフェースコンポーネントを武器化できるかを明らかにしている。本稿では,ユーザクエリの現実的な言語的変動をシミュレートする多様なクエリ生成戦略を提案する。
参考スコア（独自算出の注目度）: 7.3105371206711185
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by retrieving relevant documents from external sources to improve factual accuracy and verifiability. However, this reliance introduces new attack surfaces within the retrieval pipeline, beyond the LLM itself. While prior RAG attacks have exposed such vulnerabilities, they largely rely on manipulating user queries, which is often infeasible in practice due to fixed or protected user inputs. This narrow focus overlooks a more realistic and stealthy vector: instructional prompts, which are widely reused, publicly shared, and rarely audited. Their implicit trust makes them a compelling target for adversaries to manipulate RAG behavior covertly. We introduce a novel attack for Adversarial Instructional Prompt (AIP) that exploits adversarial instructional prompts to manipulate RAG outputs by subtly altering retrieval behavior. By shifting the attack surface to the instructional prompts, AIP reveals how trusted yet seemingly benign interface components can be weaponized to degrade system integrity. The attack is crafted to achieve three goals: (1) naturalness, to evade user detection; (2) utility, to encourage use of prompts; and (3) robustness, to remain effective across diverse query variations. We propose a diverse query generation strategy that simulates realistic linguistic variation in user queries, enabling the discovery of prompts that generalize across paraphrases and rephrasings. Building on this, a genetic algorithm-based joint optimization is developed to evolve adversarial prompts by balancing attack success, clean-task utility, and stealthiness. Experimental results show that AIP achieves up to 95.23% ASR while preserving benign functionality. These findings uncover a critical and previously overlooked vulnerability in RAG systems, emphasizing the need to reassess the shared instructional prompts.
Abstract（参考訳）: Retrieval-Augmented Generation (RAG)は、関連する文書を外部ソースから取得し、事実の精度と妥当性を向上させることで、大きな言語モデル(LLM)を強化する。しかし、この依存は、LLM自体を超えて、検索パイプライン内に新たな攻撃面を導入している。以前のRAG攻撃はそのような脆弱性を露呈していたが、ユーザクエリの操作に大きく依存している。この狭い焦点は、より現実的でステルス的なベクトルを見落としている:命令プロンプトは、広く再利用され、公開され、ほとんど監査されない。彼らの暗黙の信頼は、敵がRAGの振る舞いを隠蔽的に操作するための説得力のある標的となる。本稿では,AIP(Adversarial Instructional Prompt)に対する新たな攻撃手法を提案する。攻撃面を命令プロンプトにシフトすることで、AIPはシステムの整合性を損なうために、いかに信頼されているように見えるインターフェースコンポーネントを武器化できるかを明らかにする。この攻撃は,(1) ユーザ検出を回避する自然性,(2) プロンプトの使用を促進する実用性,(3) 多様なクエリのバリエーションにまたがって有効なロバスト性という3つの目標を達成するために開発された。ユーザクエリの現実的な言語的変動をシミュレートする多様なクエリ生成戦略を提案し,パラフレーズや言い換えを一般化するプロンプトの発見を可能にする。これに基づいて、遺伝的アルゴリズムに基づく共同最適化が開発され、攻撃の成功、クリーンタスクのユーティリティ、ステルスネスのバランスをとることで、敵のプロンプトを進化させる。実験の結果、AIPは良性機能を保ちながら最大95.23%のASRを達成することが示された。これらの発見は、RAGシステムにおいて重要で、これまで見過ごされていた脆弱性を明らかにし、共有命令プロンプトを再評価する必要性を強調した。

論文の概要: AIP: Subverting Retrieval-Augmented Generation via Adversarial Instructional Prompt

関連論文リスト