Fugu-MT 論文翻訳(概要): SafeSpec: Fast and Safe LLM via Dynamic Reflective Sampling

論文の概要: SafeSpec: Fast and Safe LLM via Dynamic Reflective Sampling

arxiv url: http://arxiv.org/abs/2606.19755v1
Date: Thu, 18 Jun 2026 03:35:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-19 18:23:39.636975
Title: SafeSpec: Fast and Safe LLM via Dynamic Reflective Sampling
Title（参考訳）: SafeSpec: ダイナミックリフレクティブサンプリングによる高速かつ安全なLCM
Authors: Haotian Xu, Zeyang Zhang, Linbao Li, Huadi Zheng, Yu Li, Cheng Zhuo,
Abstract要約: リスク推定を直接検証プロセスに統合する投機的推論フレームワークであるSafeSpecを提案する。複数のモデルと反対ベンチマークを通じて、SafeSpecは安全性と効率のトレードオフを大幅に改善した。 Qwen3-32Bでは、SafeSpecは攻撃成功率を15%削減し、良質なワークロード上で2.06倍の推論速度を維持する。
参考スコア（独自算出の注目度）: 12.768157540795707
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Speculative inference accelerates large language model (LLM) decoding but provides no inherent safety guarantees. Existing safety defenses are largely incompatible with speculative inference: they either introduce additional computation or disrupt the draft-verify mechanism, negating acceleration benefits. This reveals a fundamental incompatibility between current safety methods and speculative decoding. We propose SafeSpec, a safety-aware speculative inference framework that integrates risk estimation directly into the verification process. SafeSpec attaches a lightweight latent safety head to the target model to jointly evaluate semantic validity and safety in a single forward pass. When unsafe generations are detected, SafeSpec applies rollback and safety-guided reflective multi-sampling to recover safe continuations rather than terminating generation. We model jailbreak attacks as distributional shifts over generative trajectories, where adversarial prompts increase the probability of harmful continuations without eliminating safe ones. Under this model, SafeSpec performs risk-aware trajectory recovery within the speculative decoding process. Across multiple models and adversarial benchmarks, SafeSpec achieves a substantially improved safety-efficiency trade-off. On Qwen3-32B, SafeSpec reduces attack success rates by 15% while preserving a 2.06x inference speedup on benign workloads, demonstrating that speculative acceleration and inference-time safety can be jointly optimized.
Abstract（参考訳）: 投機推論は、大きな言語モデル(LLM)デコードを促進するが、固有の安全保証は提供しない。既存の安全防衛は投機的推論とほとんど互換性がなく、追加の計算を導入するか、ドラフト検証メカニズムを妨害し、加速の利点を否定する。これは、現在の安全性メソッドと投機的復号化の根本的な非互換性を明らかにしている。本稿では,リスク推定を直接検証プロセスに統合する安全対応型投機推論フレームワークであるSafeSpecを提案する。 SafeSpecは、軽量の潜伏安全ヘッドをターゲットモデルにアタッチし、単一のフォワードパスにおけるセマンティックな妥当性と安全性を共同で評価する。安全でない世代が検出されると、SafeSpecはロールバックと安全誘導型リフレクティブマルチサンプリングを適用して、生成を終了させるのではなく、安全な継続を回復する。我々は、ジェイルブレイク攻撃を、生成的軌跡よりも分布的なシフトとしてモデル化し、敵は安全なものを排除することなく有害な継続の確率を増大させる。このモデルでは、SafeSpecは投機的復号プロセス内でリスクを意識した軌道回復を行う。複数のモデルと反対ベンチマークを通じて、SafeSpecは安全性と効率のトレードオフを大幅に改善した。 Qwen3-32Bでは、SafeSpecは、良質なワークロードで2.06倍の推論スピードアップを維持しながら、攻撃成功率を15%削減し、投機的アクセラレーションと推論時間の安全性を共同最適化できることを実証している。

論文の概要: SafeSpec: Fast and Safe LLM via Dynamic Reflective Sampling

関連論文リスト