Fugu-MT 論文翻訳(概要): SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression

論文の概要: SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression

arxiv url: http://arxiv.org/abs/2506.12707v1
Date: Sun, 15 Jun 2025 03:39:13 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-17 17:28:46.712208
Title: SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression
Title（参考訳）: SecurityLingua: セキュリティを意識したプロンプト圧縮によるLDMのジェイルブレイク攻撃の効果的な防御
Authors: Yucheng Li, Surin Ahn, Huiqiang Jiang, Amir H. Abdi, Yuqing Yang, Lili Qiu,
Abstract要約: 大規模言語モデル(LLM)は、安全アライメント後も悪意のある攻撃に対して脆弱である。我々は,LLMをジェイルブレイク攻撃から守るための効果的かつ効率的なアプローチであるSecurityLinguaを提案する。迅速な圧縮により、SecurityLinguaは既存のすべての防御方法と比較して、無視できるオーバーヘッドと余分なトークンコストしか発生しない。
参考スコア（独自算出の注目度）: 11.839827036296649
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have achieved widespread adoption across numerous applications. However, many LLMs are vulnerable to malicious attacks even after safety alignment. These attacks typically bypass LLMs' safety guardrails by wrapping the original malicious instructions inside adversarial jailbreaks prompts. Previous research has proposed methods such as adversarial training and prompt rephrasing to mitigate these safety vulnerabilities, but these methods often reduce the utility of LLMs or lead to significant computational overhead and online latency. In this paper, we propose SecurityLingua, an effective and efficient approach to defend LLMs against jailbreak attacks via security-oriented prompt compression. Specifically, we train a prompt compressor designed to discern the "true intention" of the input prompt, with a particular focus on detecting the malicious intentions of adversarial prompts. Then, in addition to the original prompt, the intention is passed via the system prompt to the target LLM to help it identify the true intention of the request. SecurityLingua ensures a consistent user experience by leaving the original input prompt intact while revealing the user's potentially malicious intention and stimulating the built-in safety guardrails of the LLM. Moreover, thanks to prompt compression, SecurityLingua incurs only a negligible overhead and extra token cost compared to all existing defense methods, making it an especially practical solution for LLM defense. Experimental results demonstrate that SecurityLingua can effectively defend against malicious attacks and maintain utility of the LLM with negligible compute and latency overhead. Our code is available at https://aka.ms/SecurityLingua.
Abstract（参考訳）: 大規模言語モデル(LLM)は多くのアプリケーションで広く採用されている。しかし、多くのLSMは安全アライメント後も悪意のある攻撃に対して脆弱である。これらの攻撃は、通常、敵のジェイルブレイクプロンプトの中に元の悪意のある命令をラップすることで、LLMの安全ガードレールをバイパスする。従来の研究では、このような安全性の脆弱性を軽減するために、敵の訓練や迅速な言い直しなどの手法が提案されてきたが、これらの手法はしばしばLLMの有用性を減らしたり、計算オーバーヘッドやオンラインの遅延を著しく減らしたりしている。本稿では,セキュリティ指向のプロンプト圧縮によるジェイルブレイク攻撃に対して,LLMを効果的かつ効率的に防御するSecurityLinguaを提案する。具体的には、入力プロンプトの「真の意図」を識別するために設計されたプロンプト圧縮機を訓練し、特に敵対的プロンプトの悪意のある意図を検出することに焦点を当てる。そして、元のプロンプトに加えて、その意図がシステムを介してターゲットLLMに送信され、要求の真の意図を特定するのに役立つ。 SecurityLinguaは、元の入力プロンプトをそのままにして、潜在的に悪意のある意図を明らかにし、LLMの組み込み安全ガードレールを刺激することによって、一貫したユーザエクスペリエンスを保証する。さらに、迅速な圧縮により、SecurityLinguaは既存のすべての防御方法と比較して、無視できるオーバーヘッドと余分なトークンコストしか発生しないため、特にLCM防御の実用的なソリューションである。実験の結果、SecurityLinguaは悪意のある攻撃に対して効果的に防御でき、計算と遅延のオーバーヘッドを無視してLLMのユーティリティを維持できることがわかった。私たちのコードはhttps://aka.ms/SecurityLingua.comで利用可能です。

論文の概要: SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression

関連論文リスト