Fugu-MT 論文翻訳(概要): TopicAttack: An Indirect Prompt Injection Attack via Topic Transition

論文の概要: TopicAttack: An Indirect Prompt Injection Attack via Topic Transition

arxiv url: http://arxiv.org/abs/2507.13686v1
Date: Fri, 18 Jul 2025 06:23:31 GMT
ステータス: 翻訳完了
システム内更新日: 2025-07-21 20:43:26.199678
Title: TopicAttack: An Indirect Prompt Injection Attack via Topic Transition
Title（参考訳）: TopicAttack: トピック遷移による間接的プロンプトインジェクション攻撃
Authors: Yulin Chen, Haoran Li, Yuexin Li, Yue Liu, Yangqiu Song, Bryan Hooi,
Abstract要約: 大規模言語モデル(LLM)は間接的なインジェクション攻撃に対して脆弱である。提案するTopicAttackは,LLMに生成した遷移プロンプトを生成し,徐々にトピックをインジェクション命令にシフトさせる。提案手法は, インジェクトからオリジナルへのアテンション比が高く, 成功確率が高く, ベースライン法よりもはるかに高い比を達成できることがわかった。
参考スコア（独自算出の注目度）: 71.81906608221038
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have shown remarkable performance across a range of NLP tasks. However, their strong instruction-following capabilities and inability to distinguish instructions from data content make them vulnerable to indirect prompt injection attacks. In such attacks, instructions with malicious purposes are injected into external data sources, such as web documents. When LLMs retrieve this injected data through tools, such as a search engine and execute the injected instructions, they provide misled responses. Recent attack methods have demonstrated potential, but their abrupt instruction injection often undermines their effectiveness. Motivated by the limitations of existing attack methods, we propose TopicAttack, which prompts the LLM to generate a fabricated conversational transition prompt that gradually shifts the topic toward the injected instruction, making the injection smoother and enhancing the plausibility and success of the attack. Through comprehensive experiments, TopicAttack achieves state-of-the-art performance, with an attack success rate (ASR) over 90\% in most cases, even when various defense methods are applied. We further analyze its effectiveness by examining attention scores. We find that a higher injected-to-original attention ratio leads to a greater success probability, and our method achieves a much higher ratio than the baseline methods.
Abstract（参考訳）: 大規模言語モデル(LLM)は、様々なNLPタスクで顕著なパフォーマンスを示している。しかし、その強い命令フォロー機能とデータコンテンツと命令を区別できないため、間接的なインジェクション攻撃に対して脆弱である。このような攻撃では、悪意のある目的を持った命令がWebドキュメントなどの外部データソースに注入される。 LLMは、このインジェクションされたデータを検索エンジンなどのツールを介して検索し、インジェクションされた命令を実行すると、誤った応答を提供する。近年の攻撃法は可能性を示しているが、その急激なインジェクションは効果を損なうことが多い。既存の攻撃方法の限界に触発されたTopicAttackを提案することで、LLMは、徐々にトピックをインジェクション命令にシフトさせ、インジェクションをよりスムーズにし、攻撃の妥当性と成功を向上する会話遷移プロンプトを生成する。総合的な実験を通じて、TopicAttackは最先端のパフォーマンスを達成し、攻撃成功率(ASR)は、様々な防御方法を適用した場合でも、ほとんどの場合90%以上である。さらに、注意点の点検により、その効果を解析する。提案手法は, インジェクトからオリジナルへのアテンション比が高く, 成功確率が高く, ベースライン法よりもはるかに高い比を達成できることがわかった。

論文の概要: TopicAttack: An Indirect Prompt Injection Attack via Topic Transition

関連論文リスト