Fugu-MT 論文翻訳(概要): Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs

論文の概要: Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs

arxiv url: http://arxiv.org/abs/2511.12710v1
Date: Sun, 16 Nov 2025 17:52:07 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-18 14:36:24.496465
Title: Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
Title（参考訳）: LLMにおけるジェイルブレイク攻撃の進化的合成
Authors: Yunhao Chen, Xin Wang, Juncheng Li, Yixu Wang, Jie Li, Yan Teng, Yingchun Wang, Xingjun Ma,
Abstract要約: Evo Synthは、アタック計画からジェイルブレイクメソッドの進化的合成にパラダイムをシフトする、自律的なフレームワークである。マルチエージェントシステムを使用して、新しいコードベースの攻撃アルゴリズムを自律的に設計し、進化させ、実行します。
参考スコア（独自算出の注目度）: 39.861409656970075
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Automated red teaming frameworks for Large Language Models (LLMs) have become increasingly sophisticated, yet they share a fundamental limitation: their jailbreak logic is confined to selecting, combining, or refining pre-existing attack strategies. This binds their creativity and leaves them unable to autonomously invent entirely new attack mechanisms. To overcome this gap, we introduce \textbf{EvoSynth}, an autonomous framework that shifts the paradigm from attack planning to the evolutionary synthesis of jailbreak methods. Instead of refining prompts, EvoSynth employs a multi-agent system to autonomously engineer, evolve, and execute novel, code-based attack algorithms. Crucially, it features a code-level self-correction loop, allowing it to iteratively rewrite its own attack logic in response to failure. Through extensive experiments, we demonstrate that EvoSynth not only establishes a new state-of-the-art by achieving an 85.5\% Attack Success Rate (ASR) against highly robust models like Claude-Sonnet-4.5, but also generates attacks that are significantly more diverse than those from existing methods. We release our framework to facilitate future research in this new direction of evolutionary synthesis of jailbreak methods. Code is available at: https://github.com/dongdongunique/EvoSynth.
Abstract（参考訳）: 大規模言語モデル(LLM)のための自動化されたレッドチーム化フレームワークは、ますます洗練されてきていますが、基本的な制限を共有しています。これは彼らの創造性を束縛し、全く新しい攻撃メカニズムを自律的に発明することができない。このギャップを克服するために,攻撃計画からジェイルブレイク手法の進化的合成へパラダイムをシフトする自律的フレームワークである‘textbf{EvoSynth} を導入する。プロンプトを書き換える代わりに、EvoSynthはマルチエージェントシステムを使用して、新しいコードベースの攻撃アルゴリズムを自律的に設計、進化、実行している。重要なことに、コードレベルの自己訂正ループを備えており、失敗に対応するために、自身のアタックロジックを反復的に書き直すことができる。広範な実験を通じて、EvoSynthは、Claude-Sonnet-4.5のような高度に堅牢なモデルに対して85.5倍の攻撃成功率(ASR)を達成することによって、新しい最先端技術を確立するだけでなく、既存の手法よりもはるかに多様な攻撃を生成することを示した。我々は,ジェイルブレイク法の進化的合成の新たな方向性について,今後の研究を促進するためのフレームワークをリリースする。コードは、https://github.com/dongdongunique/EvoSynth.comで入手できる。

論文の概要: Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs

関連論文リスト