Fugu-MT 論文翻訳(概要): Know Thy Enemy: Securing LLMs Against Prompt Injection via Diverse Data Synthesis and Instruction-Level Chain-of-Thought Learning

論文の概要: Know Thy Enemy: Securing LLMs Against Prompt Injection via Diverse Data Synthesis and Instruction-Level Chain-of-Thought Learning

arxiv url: http://arxiv.org/abs/2601.04666v1
Date: Thu, 08 Jan 2026 07:25:27 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-09 17:01:53.076017
Title: Know Thy Enemy: Securing LLMs Against Prompt Injection via Diverse Data Synthesis and Instruction-Level Chain-of-Thought Learning
Title（参考訳）: Know Thy Enemy: 異なるデータ合成と命令レベル学習によるプロンプト注入に対するLLMの安全性
Authors: Zhiyuan Chang, Mingyang Li, Yuekai Huang, Ziyou Jiang, Xiaojun Jia, Qian Xiong, Junjie Wang, Zhaoyang Li, Qing Wang,
Abstract要約: 大規模言語モデル(LLM)統合アプリケーションはますます普及しているが、プロンプトインジェクション(PI)攻撃による重大なセキュリティ上の脆弱性に直面している。 InstruCoTは、多種多様な訓練データを合成し、微調整を指示レベルチェーンで行うPIディフェンスのモデル拡張手法である。
参考スコア（独自算出の注目度）: 31.790490397086856
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language model (LLM)-integrated applications have become increasingly prevalent, yet face critical security vulnerabilities from prompt injection (PI) attacks. Defending against PI attacks faces two major issues: malicious instructions can be injected through diverse vectors, and injected instructions often lack clear semantic boundaries from the surrounding context, making them difficult to identify. To address these issues, we propose InstruCoT, a model enhancement method for PI defense that synthesizes diverse training data and employs instruction-level chain-of-thought fine-tuning, enabling LLMs to effectively identify and reject malicious instructions regardless of their source or position in the context. We evaluate InstruCoT across three critical dimensions: Behavior Deviation, Privacy Leakage, and Harmful Output. Experimental results across four LLMs demonstrate that InstruCoT significantly outperforms baselines in all dimensions while maintaining utility performance without degradation
Abstract（参考訳）: 大規模言語モデル(LLM)統合アプリケーションはますます普及しているが、プロンプトインジェクション(PI)攻撃による重大なセキュリティ上の脆弱性に直面している。悪意のある命令は様々なベクトルを通して注入できるし、注入された命令は周囲のコンテキストから明確な意味境界を欠くことが多く、識別が困難である。これらの問題に対処するために,多種多様な訓練データを合成し,命令レベルのチェーン・オブ・ファインタニングを利用するPIディフェンスのモデル拡張手法であるInstruCoTを提案する。 InstruCoTは、行動偏差、プライバシー漏洩、有害出力の3つの重要な側面で評価する。 4つのLCM実験結果から、InstruCoTは劣化のない実用性能を維持しつつ、全次元のベースラインを著しく上回ることを示した。

論文の概要: Know Thy Enemy: Securing LLMs Against Prompt Injection via Diverse Data Synthesis and Instruction-Level Chain-of-Thought Learning

関連論文リスト