Fugu-MT 論文翻訳(概要): Confidence-Modulated Speculative Decoding for Large Language Models

論文の概要: Confidence-Modulated Speculative Decoding for Large Language Models

arxiv url: http://arxiv.org/abs/2508.15371v1
Date: Thu, 21 Aug 2025 09:06:31 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-22 16:26:46.251119
Title: Confidence-Modulated Speculative Decoding for Large Language Models
Title（参考訳）: 大規模言語モデルに対する信頼制御型投機的復号法
Authors: Jaydip Sen, Subhasis Dasgupta, Hetvi Waghela,
Abstract要約: 本稿では,信頼度変調された起草に基づく投機的復号化のための情報理論フレームワークを提案する。機械翻訳と要約タスクの実験は、標準的な投機的復号化よりも大幅に高速化された。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Speculative decoding has emerged as an effective approach for accelerating autoregressive inference by parallelizing token generation through a draft-then-verify paradigm. However, existing methods rely on static drafting lengths and rigid verification criteria, limiting their adaptability across varying model uncertainties and input complexities. This paper proposes an information-theoretic framework for speculative decoding based on confidence-modulated drafting. By leveraging entropy and margin-based uncertainty measures over the drafter's output distribution, the proposed method dynamically adjusts the number of speculatively generated tokens at each iteration. This adaptive mechanism reduces rollback frequency, improves resource utilization, and maintains output fidelity. Additionally, the verification process is modulated using the same confidence signals, enabling more flexible acceptance of drafted tokens without sacrificing generation quality. Experiments on machine translation and summarization tasks demonstrate significant speedups over standard speculative decoding while preserving or improving BLEU and ROUGE scores. The proposed approach offers a principled, plug-in method for efficient and robust decoding in large language models under varying conditions of uncertainty.
Abstract（参考訳）: 投機的復号化は,トークン生成の並列化による自己回帰推論の高速化に有効な手法として出現している。しかし、既存の手法は静的な起草の長さと厳密な検証基準に依存しており、様々なモデルの不確実性や入力の複雑さに対して適応性を制限する。本稿では,信頼度変調された起草に基づく投機的復号化のための情報理論フレームワークを提案する。提案手法は,提案手法を用いて各イテレーションにおける投機生成トークンの数を動的に調整する。この適応機構はロールバック頻度を低減し、資源利用を改善し、出力忠実性を維持する。また、同一の信頼信号を用いて検証処理を変調し、生成品質を犠牲にすることなく、より柔軟な起草トークンの受け入れを可能にする。機械翻訳と要約タスクの実験は、BLEUとROUGEのスコアを保存または改善しながら、標準的な投機復号よりも大幅に高速化された。提案手法は, 様々な不確実性条件下での大規模言語モデルにおいて, 効率的かつ堅牢な復号化を行うための, 原則的, プラグイン方式を提供する。

論文の概要: Confidence-Modulated Speculative Decoding for Large Language Models

関連論文リスト