Fugu-MT 論文翻訳(概要): DecipherGuard: Understanding and Deciphering Jailbreak Prompts for a Safer Deployment of Intelligent Software Systems

論文の概要: DecipherGuard: Understanding and Deciphering Jailbreak Prompts for a Safer Deployment of Intelligent Software Systems

arxiv url: http://arxiv.org/abs/2509.16870v1
Date: Sun, 21 Sep 2025 01:46:35 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 14:54:38.954539
Title: DecipherGuard: Understanding and Deciphering Jailbreak Prompts for a Safer Deployment of Intelligent Software Systems
Title（参考訳）: DecipherGuard: インテリジェントなソフトウェアシステムの安全なデプロイのためのジェイルブレイクプロンプトの理解と解読
Authors: Rui Yang, Michael Fu, Chakkrit Tantithamthavorn, Chetan Arora, Gunel Gulmammadova, Joey Chua,
Abstract要約: DecipherGuardは、難読化ベースのプロンプトに対抗するための解読レイヤと、脱獄攻撃に対するガードレールの有効性を高めるための低ランク適応メカニズムを統合する新しいフレームワークである。 22,000以上のプロンプトに対する実証的な評価は、DecipherGuardがDSRを36%から65%改善し、全体的なガードレール性能(OGP)がLlamaGuardや他の2つのランタイムガードレールと比較して20%から50%向上したことを示している。
参考スコア（独自算出の注目度）: 11.606665113249298
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Intelligent software systems powered by Large Language Models (LLMs) are increasingly deployed in critical sectors, raising concerns about their safety during runtime. Through an industry-academic collaboration when deploying an LLM-powered virtual customer assistant, a critical software engineering challenge emerged: how to enhance a safer deployment of LLM-powered software systems at runtime? While LlamaGuard, the current state-of-the-art runtime guardrail, offers protection against unsafe inputs, our study reveals a Defense Success Rate (DSR) drop of 24% under obfuscation- and template-based jailbreak attacks. In this paper, we propose DecipherGuard, a novel framework that integrates a deciphering layer to counter obfuscation-based prompts and a low-rank adaptation mechanism to enhance guardrail effectiveness against template-based attacks. Empirical evaluation on over 22,000 prompts demonstrates that DecipherGuard improves DSR by 36% to 65% and Overall Guardrail Performance (OGP) by 20% to 50% compared to LlamaGuard and two other runtime guardrails. These results highlight the effectiveness of DecipherGuard in defending LLM-powered software systems against jailbreak attacks during runtime.
Abstract（参考訳）: LLM(Large Language Models)をベースとするインテリジェントなソフトウェアシステムは、ますます重要な分野に展開され、実行時の安全性に対する懸念が高まっている。 LLMを搭載した仮想顧客アシスタントをデプロイする際、業界と学際的なコラボレーションを通じて、重要なソフトウェアエンジニアリングの課題が浮かび上がった。現在最先端のランタイムガードレールであるLlamaGuardは、安全でない入力に対して保護を提供するが、我々の研究は、難読化およびテンプレートベースのジェイルブレイク攻撃により、防衛成功率(DSR)が24%低下していることを明らかにした。本稿では,難読化によるプロンプトに対抗するために復号層を統合する新しいフレームワークであるDecipherGuardと,テンプレートベースの攻撃に対するガードレールの有効性を高めるための低ランク適応機構を提案する。 22,000以上のプロンプトに対する実証的な評価は、DecipherGuardがDSRを36%から65%改善し、全体的なガードレール性能(OGP)がLlamaGuardや他の2つのランタイムガードレールと比較して20%から50%向上したことを示している。これらの結果は、実行中のJailbreak攻撃に対してLLMベースのソフトウェアシステムを保護する上で、DecipherGuardの有効性を強調している。

論文の概要: DecipherGuard: Understanding and Deciphering Jailbreak Prompts for a Safer Deployment of Intelligent Software Systems

関連論文リスト