Fugu-MT 論文翻訳(概要): Bias Amplification in RAG: Poisoning Knowledge Retrieval to Steer LLMs

論文の概要: Bias Amplification in RAG: Poisoning Knowledge Retrieval to Steer LLMs

arxiv url: http://arxiv.org/abs/2506.11415v1
Date: Fri, 13 Jun 2025 02:28:46 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-16 17:50:49.629859
Title: Bias Amplification in RAG: Poisoning Knowledge Retrieval to Steer LLMs
Title（参考訳）: RAGにおけるバイアス増幅 : ステアLSMに対する知識検索
Authors: Linlin Wang, Tianqing Zhu, Laiqiao Qin, Longxiang Gao, Wanlei Zhou,
Abstract要約: 大規模言語モデルでは、検索拡張生成(RAG)システムは、外部知識を統合することで、大規模言語モデルの性能を大幅に向上させることができる。既存の研究は主に、RAGシステムにおける中毒攻撃が、モデルバイアスを増幅する可能性を見越して、モデル出力品質にどのように影響するかに焦点を当てている。本稿では,言語モデルのバイアスを増幅する攻撃経路を体系的に調査するBias Retrieval and Reward Attack(BRRA)フレームワークを提案する。
参考スコア（独自算出の注目度）: 17.364495894862902
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In Large Language Models, Retrieval-Augmented Generation (RAG) systems can significantly enhance the performance of large language models by integrating external knowledge. However, RAG also introduces new security risks. Existing research focuses mainly on how poisoning attacks in RAG systems affect model output quality, overlooking their potential to amplify model biases. For example, when querying about domestic violence victims, a compromised RAG system might preferentially retrieve documents depicting women as victims, causing the model to generate outputs that perpetuate gender stereotypes even when the original query is gender neutral. To show the impact of the bias, this paper proposes a Bias Retrieval and Reward Attack (BRRA) framework, which systematically investigates attack pathways that amplify language model biases through a RAG system manipulation. We design an adversarial document generation method based on multi-objective reward functions, employ subspace projection techniques to manipulate retrieval results, and construct a cyclic feedback mechanism for continuous bias amplification. Experiments on multiple mainstream large language models demonstrate that BRRA attacks can significantly enhance model biases in dimensions. In addition, we explore a dual stage defense mechanism to effectively mitigate the impacts of the attack. This study reveals that poisoning attacks in RAG systems directly amplify model output biases and clarifies the relationship between RAG system security and model fairness. This novel potential attack indicates that we need to keep an eye on the fairness issues of the RAG system.
Abstract（参考訳）: 大規模言語モデルでは、検索拡張生成(RAG)システムは、外部知識を統合することで、大規模言語モデルの性能を大幅に向上させることができる。しかし、RAGは新たなセキュリティリスクも導入している。既存の研究は主に、RAGシステムにおける中毒攻撃が、モデルバイアスを増幅する可能性を見越して、モデル出力品質にどのように影響するかに焦点を当てている。例えば、家庭内暴力の被害者について質問する場合、RAGシステムは女性を被害者として描写した文書を優先的に検索し、オリジナルのクエリが性中立である場合でも、性別のステレオタイプを持続させる出力を生成する。本稿では,言語モデルのバイアスをRAGシステム操作によって増幅する攻撃経路を系統的に検討する,バイアス検索・逆攻撃(BRRA)フレームワークを提案する。我々は,多目的報酬関数に基づく逆文書生成手法を設計し,サブスペースプロジェクション技術を用いて検索結果を操作し,連続バイアス増幅のための循環フィードバック機構を構築する。複数の主要言語モデルに対する実験により、BRRA攻撃は次元のモデルバイアスを大幅に向上させることができることが示された。さらに,攻撃の影響を効果的に緩和する2段階防御機構についても検討する。本研究により,RAGシステムにおける中毒攻撃は,モデル出力バイアスを直接増幅し,RAGシステムのセキュリティとモデルフェアネスの関係を明らかにする。この新たな攻撃の可能性は、RAGシステムの公平性の問題に目を向ける必要があることを示している。

論文の概要: Bias Amplification in RAG: Poisoning Knowledge Retrieval to Steer LLMs

関連論文リスト