Fugu-MT 論文翻訳(概要): Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency

論文の概要: Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency

arxiv url: http://arxiv.org/abs/2510.21189v1
Date: Fri, 24 Oct 2025 06:39:08 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-28 06:57:23.386293
Title: Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency
Title（参考訳）: 隣接語, 発散詞:タスク並行性による大規模言語モデルのジェイルブレーク
Authors: Yukun Jiang, Mingjie Li, Michael Backes, Yang Zhang,
Abstract要約: 既存のjailbreak攻撃は主にシーケンシャルロジックに従っており、大きな言語モデル(LLM)は各タスクをひとつずつ理解し、答える。私たちは、$textttJAIL-CON$という、タスク$underlinetextCON$currencyを介してLLMを壊す反復攻撃フレームワークを紹介します。ガードレールを防御として適用した場合、以前の攻撃で生成されたシーケンシャルな回答と比較して、@textttJAIL-CON$の同時回答はよりステルス性が高い。
参考スコア（独自算出の注目度）: 22.04568330005493
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Despite their superior performance on a wide range of domains, large language models (LLMs) remain vulnerable to misuse for generating harmful content, a risk that has been further amplified by various jailbreak attacks. Existing jailbreak attacks mainly follow sequential logic, where LLMs understand and answer each given task one by one. However, concurrency, a natural extension of the sequential scenario, has been largely overlooked. In this work, we first propose a word-level method to enable task concurrency in LLMs, where adjacent words encode divergent intents. Although LLMs maintain strong utility in answering concurrent tasks, which is demonstrated by our evaluations on mathematical and general question-answering benchmarks, we notably observe that combining a harmful task with a benign one significantly reduces the probability of it being filtered by the guardrail, showing the potential risks associated with concurrency in LLMs. Based on these findings, we introduce $\texttt{JAIL-CON}$, an iterative attack framework that $\underline{\text{JAIL}}$breaks LLMs via task $\underline{\text{CON}}$currency. Experiments on widely-used LLMs demonstrate the strong jailbreak capabilities of $\texttt{JAIL-CON}$ compared to existing attacks. Furthermore, when the guardrail is applied as a defense, compared to the sequential answers generated by previous attacks, the concurrent answers in our $\texttt{JAIL-CON}$ exhibit greater stealthiness and are less detectable by the guardrail, highlighting the unique feature of task concurrency in jailbreaking LLMs.
Abstract（参考訳）: 幅広い領域での優れたパフォーマンスにもかかわらず、大きな言語モデル(LLM)は有害なコンテンツを生成する誤用に対して脆弱なままであり、これは様々なジェイルブレイク攻撃によってさらに増幅されたリスクである。既存のjailbreak攻撃は主にシーケンシャルロジックに従っており、LLMはそれぞれのタスクを1つずつ理解し、答える。しかし、シーケンシャルシナリオの自然な拡張である並行性はほとんど見過ごされている。本研究ではまず,LLMにおけるタスク並行化を実現するための単語レベル手法を提案する。 LLMは並列処理に強力な効用を保ち、数学的および一般的な質問応答ベンチマークで評価した結果、有害なタスクと良性なタスクを組み合わせることでガードレールによってフィルタリングされる確率が大幅に減少し、LLMの並行処理に伴う潜在的なリスクが示されることが明らかとなった。これらの結果に基づいて、$\underline{\text{JAIL}}$breaks LLMs via task $\underline{\text{CON}}$currencyという反復攻撃フレームワークである$\textt{JAIL-CON}$を紹介します。広く使われているLLMの実験は、既存の攻撃と比較して$\texttt{JAIL-CON}$の強いジェイルブレイク能力を示している。さらに、ガードレールを防御として適用した場合、以前の攻撃で生成されたシーケンシャルな回答と比較して、 $\texttt{JAIL-CON}$ の同時回答はよりステルス性が高く、ガードレールによって検出されにくく、ジェイルブレイクする LLM におけるタスク並行性のユニークな特徴を強調します。

論文の概要: Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency

関連論文リスト