Fugu-MT 論文翻訳(概要): Practical Reasoning Interruption Attacks on Reasoning Large Language Models

論文の概要: Practical Reasoning Interruption Attacks on Reasoning Large Language Models

arxiv url: http://arxiv.org/abs/2505.06643v1
Date: Sat, 10 May 2025 13:36:01 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-13 20:21:48.955945
Title: Practical Reasoning Interruption Attacks on Reasoning Large Language Models
Title（参考訳）: 大規模言語モデルに対する実践的推論中断攻撃
Authors: Yu Cui, Cong Zuo,
Abstract要約: 大規模な言語モデル(RLLM)の推論は、さまざまなタスクにわたって優れたパフォーマンスを示してきたが、多くのセキュリティ脆弱性も明らかにしている。最近の研究で、DeepSeek-R1では、敵対的なプロンプトの下で、明確な"思考停止"脆弱性が特定されている。我々は,新規なプロンプトインジェクション・アタックを開発し,その根本原因を解析した。
参考スコア（独自算出の注目度）: 0.24963930962128378
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reasoning large language models (RLLMs) have demonstrated outstanding performance across a variety of tasks, yet they also expose numerous security vulnerabilities. Most of these vulnerabilities have centered on the generation of unsafe content. However, recent work has identified a distinct "thinking-stopped" vulnerability in DeepSeek-R1: under adversarial prompts, the model's reasoning process ceases at the system level and produces an empty final answer. Building upon this vulnerability, researchers developed a novel prompt injection attack, termed reasoning interruption attack, and also offered an initial analysis of its root cause. Through extensive experiments, we verify the previous analyses, correct key errors based on three experimental findings, and present a more rigorous explanation of the fundamental causes driving the vulnerability. Moreover, existing attacks typically require over 2,000 tokens, impose significant overhead, reduce practicality, and are easily detected. To overcome these limitations, we propose the first practical reasoning interruption attack. It succeeds with just 109 tokens by exploiting our newly uncovered "reasoning token overflow" (RTO) effect to overwrite the model's final answer, forcing it to return an invalid response. Experimental results demonstrate that our proposed attack is highly effective. Furthermore, we discover that the method for triggering RTO differs between the official DeepSeek-R1 release and common unofficial deployments. As a broadened application of RTO, we also construct a novel jailbreak attack that enables the transfer of unsafe content within the reasoning tokens into final answer, thereby exposing it to the user. Our work carries significant implications for enhancing the security of RLLMs.
Abstract（参考訳）: 大規模な言語モデル(RLLM)の推論は、さまざまなタスクにわたって優れたパフォーマンスを示してきたが、多くのセキュリティ脆弱性も明らかにしている。これらの脆弱性のほとんどは、安全でないコンテンツの生成を中心にしている。しかし、最近の研究は、DeepSeek-R1の明確な"シンキングストップ"脆弱性を特定している: 敵対的なプロンプトの下で、モデルの推論プロセスはシステムレベルで停止し、空の最終的な答えを生成する。この脆弱性に基づいて、研究者らは新たなプロンプト・インジェクション・アタック(英語版)を開発し、割り込み・アタック(英語版)とよばれ、根本原因の初期の分析も提供した。 3つの実験結果に基づいて、前回の分析を検証し、キーエラーを正し、脆弱性の原因についてより厳密な説明を行う。さらに、既存の攻撃は通常2,000以上のトークンを必要とし、かなりのオーバーヘッドを課し、実用性を低下させ、容易に検出できる。これらの制限を克服するため、我々は最初の実用的な割り込み攻撃を提案する。これは、新たに発見された"reasoning token overflow"(RTO)効果を利用して、モデルの最終回答を上書きし、無効なレスポンスを返さざるを得ない、わずか109トークンで成功します。実験の結果,提案した攻撃は極めて効果的であることが確認された。さらに、RTOをトリガーする方法は、公式のDeepSeek-R1リリースと、一般的な非公式デプロイメントとの違いが判明した。また, RTO の応用拡大として, 推論トークン内の安全でないコンテンツの最終回答への転送を可能にする新しいjailbreak攻撃を構築し, ユーザに対して公開する。我々の研究は、RLLMのセキュリティ向上に重要な意味を持っている。

関連論文リスト

Paper Summary Attack: Jailbreaking LLMs through LLM Safety Papers [61.57691030102618]
我々は新しいジェイルブレイク手法であるペーパー・サプリメント・アタック(llmnamePSA)を提案する。攻撃に焦点をあてたLLM安全紙からコンテンツを合成し、敵のプロンプトテンプレートを構築する。実験では、ベースLLMだけでなく、Deepseek-R1のような最先端の推論モデルにも重大な脆弱性がある。
論文参考訳（メタデータ） (2025-07-17T18:33:50Z)
ARMOR: Aligning Secure and Safe Large Language Models via Meticulous Reasoning [49.47193675702453]
大規模言語モデル(LLM)は、顕著な生成能力を示している。 LLMは、安全上の制約を回避できる悪意のある命令に弱いままである。推論に基づく安全アライメントフレームワークARMORを提案する。
論文参考訳（メタデータ） (2025-07-14T09:05:54Z)
Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs [83.11815479874447]
本研究では,人間の認知における認知的分解と偏見に触発された新しいジェイルブレイク攻撃フレームワークを提案する。我々は、悪意のあるプロンプトの複雑さと関連バイアスを減らし、認知的分解を用いて、プロンプトを再編成する。また、従来の二分的成功または失敗のパラダイムを超越したランキングベースの有害度評価指標も導入する。
論文参考訳（メタデータ） (2025-05-03T05:28:11Z)
Token-Efficient Prompt Injection Attack: Provoking Cessation in LLM Reasoning via Adaptive Token Compression [12.215295420714787]
推論割り込み攻撃(Reasoning Interruption Attack)は、適応トークン圧縮に基づく即発インジェクション攻撃である。本研究では,アタックプロンプトと適応トークン圧縮フレームワークを効率的に収集するための体系的アプローチを開発する。実効攻撃能力を保ちながら,我々の圧縮フレームワークがプロンプト長を大幅に短縮することを示す実験を行った。
論文参考訳（メタデータ） (2025-04-29T07:34:22Z)
To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models [56.19026073319406]
大規模推論モデル (LRM) は、最終的な答えを生成する前に明確な推論トレースを生成することで複雑なタスクを解決するように設計されている。 LRM(Unthinking)と呼ばれる重要な脆弱性を明らかにし、特別なトークンを操作することで思考プロセスを回避できます。本稿では,この脆弱性を悪意と有益の両方の観点から検討する。
論文参考訳（メタデータ） (2025-02-16T10:45:56Z)
Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models [53.580928907886324]
Reasoning-Augmented Conversationは、新しいマルチターンジェイルブレイクフレームワークである。有害なクエリを良心的な推論タスクに再構成する。 RACEは,複雑な会話シナリオにおいて,最先端攻撃の有効性を実現する。
論文参考訳（メタデータ） (2025-02-16T09:27:44Z)
Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions [51.51850981481236]
非倫理的反応を引き起こすために、対照的な推論を利用する新しいジェイルブレイク手法であるPOATEを導入する。 PoATEは意味論的に意図に反し、敵のテンプレートと統合し、有害なアウトプットを驚くほど微妙に操る。これに対応するために、悪意のある意図と理性を検出するためにクエリを分解して、有害な応答を評価し、拒否するIntent-Aware CoTとReverse Thinking CoTを提案する。
論文参考訳（メタデータ） (2025-01-03T15:40:03Z)
You Know What I'm Saying: Jailbreak Attack via Implicit Reference [22.520950422702757]
本研究は、以前見過ごされた脆弱性を特定し、Implicit Reference (AIR) による攻撃(Attack)と呼ぶ。 AIRは悪意のある目的を許容可能な目的に分解し、コンテキスト内の暗黙の参照を通してそれらをリンクする。我々の実験は、AIRが最先端のLLMに対して有効であることを示し、ほとんどのモデルで90%を超える攻撃成功率(ASR)を達成した。
論文参考訳（メタデータ） (2024-10-04T18:42:57Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。