Fugu-MT 論文翻訳(概要): Sockpuppetting: Jailbreaking LLMs Without Optimization Through Output Prefix Injection

論文の概要: Sockpuppetting: Jailbreaking LLMs Without Optimization Through Output Prefix Injection

arxiv url: http://arxiv.org/abs/2601.13359v1
Date: Mon, 19 Jan 2026 19:53:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-21 22:47:23.042197
Title: Sockpuppetting: Jailbreaking LLMs Without Optimization Through Output Prefix Injection
Title（参考訳）: Sockpuppetting: 出力プレフィックスインジェクションによる最適化なしでLLMをジェイルブレイクする
Authors: Asen Dotsinski, Panagiotis Eustratiadis,
Abstract要約: sockpuppetting"は、オープンウェイト言語モデルをジェイルブレイクするためのシンプルな方法である。攻撃成功率(ASR)はQwen3-8BのGCGよりも80%高い。
参考スコア（独自算出の注目度）: 2.8329969194317
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As open-weight large language models (LLMs) increase in capabilities, safeguarding them against malicious prompts and understanding possible attack vectors becomes ever more important. While automated jailbreaking methods like GCG [Zou et al., 2023] remain effective, they often require substantial computational resources and specific expertise. We introduce "sockpuppetting'', a simple method for jailbreaking open-weight LLMs by inserting an acceptance sequence (e.g., "Sure, here is how to...'') at the start of a model's output and allowing it to complete the response. Requiring only a single line of code and no optimization, sockpuppetting achieves up to 80% higher attack success rate (ASR) than GCG on Qwen3-8B in per-prompt comparisons. We also explore a hybrid approach that optimizes the adversarial suffix within the assistant message block rather than the user prompt, increasing ASR by 64% over GCG on Llama-3.1-8B in a prompt-agnostic setting. The results establish sockpuppetting as an effective low-cost attack accessible to unsophisticated adversaries, highlighting the need for defences against output-prefix injection in open-weight models.
Abstract（参考訳）: オープンウェイトな大規模言語モデル(LLM)の能力が向上するにつれて、悪意のあるプロンプトから彼らを保護し、攻撃ベクトルを理解することがますます重要になる。 GCG(Zou et al , 2023)のような自動ジェイルブレイク手法は依然として有効であるが、かなりの計算資源と特定の専門知識を必要とすることが多い。我々は、モデル出力の開始時に受け入れシーケンス(例えば、"Sure, here is how to...'')を挿入し、応答を完了させることで、オープンウェイトLLMをジェイルブレイクする簡単な方法である"sockpuppetting'"を紹介します。たった1行のコードだけで最適化が不要なソックアップペッティングは、Qwen3-8Bの攻撃成功率(ASR)を、プロンプト毎の比較で最大80%向上させる。また、ユーザプロンプトではなく、アシスタントメッセージブロック内の逆接接尾辞を最適化するハイブリッドアプローチについても検討し、Llama-3.1-8B上のGCGよりもASRを64%増加させる。その結果、ソックアップペッティングは、未解決の敵に対して有効な低コスト攻撃として確立され、オープンウェイトモデルにおけるアウトプット・プレフィックス・インジェクションに対する防御の必要性が強調された。

論文の概要: Sockpuppetting: Jailbreaking LLMs Without Optimization Through Output Prefix Injection

関連論文リスト