Fugu-MT 論文翻訳(概要): Reject Only Critical Tokens: Pivot-Aware Speculative Decoding

論文の概要: Reject Only Critical Tokens: Pivot-Aware Speculative Decoding

arxiv url: http://arxiv.org/abs/2511.00351v1
Date: Sat, 01 Nov 2025 01:35:10 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-05 16:37:26.734627
Title: Reject Only Critical Tokens: Pivot-Aware Speculative Decoding
Title（参考訳）: 致命的なトークンのみを拒絶する:Pivot-Aware Speculative Decoding
Authors: Amir Ziashahabi, Yavuz Faruk Bakman, Duygu Nur Yaldiz, Mostafa El-Khamy, Sai Praneeth Karimireddy, Salman Avestimehr,
Abstract要約: 投機的復号(SD)は、出力がターゲットモデルの分布と正確に一致することを保証する。提案するPivot-Aware Speculative Decodingは,最終的な出力の実用性低下につながるトークンのみを拒否する。さまざまなデータセットにまたがってメソッドを評価し、同等のユーティリティで最大2.5タイムのスピードアップを達成できることを示します。
参考スコア（独自算出の注目度）: 31.22793593647334
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Speculative Decoding (SD) ensures that the output matches the target model's distribution exactly. However, we argue that this distribution matching requirement is too stringent and results in unnecessarily low acceptance rates, limiting potential speedups. Instead, we advocate a reformulation of the decoding objective: the proposed decoding strategy should match the expected utility, i.e., the task-specific performance, of the target model. This perspective also aligns better with real-world use cases of LLMs, where utility (e.g., code correctness, factual accuracy) is often more important than sampling distribution. Based on this reformulation, we propose a novel decoding strategy: Pivot-Aware Speculative Decoding, which rejects only those tokens that would lead to a utility drop in the final output. We refer to these critical tokens as pivot tokens. We propose a method for labeling tokens as pivotal or non-pivotal and train a lightweight classifier to detect them. This method can be viewed as a relaxed version of standard SD, which offers much higher acceptance while preserving utility. We evaluate our method across various datasets, demonstrating that we can achieve up to $2.5\times$ speedup with comparable utility. Source code is available at https://github.com/amir-zsh/PAD.
Abstract（参考訳）: 投機的復号(SD)は、出力がターゲットモデルの分布と正確に一致することを保証する。しかし、この分布マッチング要件は厳しすぎるため、不必要に受け入れ率が低くなり、潜在的なスピードアップが制限される。提案する復号戦略は,目標モデルの期待する実用性,すなわちタスク固有の性能と一致すべきである。この観点は、実用性(例えば、コード正確性、事実的正確性)が分散をサンプリングするよりも重要である場合において、LLMの現実世界のユースケースとよく一致している。この改定に基づいて、我々は新たな復号戦略であるPivot-Aware Speculative Decodingを提案する。これらの重要なトークンをピボットトークンと呼ぶ。本稿では,トークンをピボットあるいは非ピボットとしてラベル付けする手法を提案し,それを検出するために軽量な分類器を訓練する。この方法は標準SDの緩和版と見なすことができ、実用性を維持しながらはるかに高い受け入れを提供する。さまざまなデータセットにまたがってメソッドを評価し、同等のユーティリティで最大2.5\times$スピードアップを達成できることを示します。ソースコードはhttps://github.com/amir-zsh/PAD.comで入手できる。

論文の概要: Reject Only Critical Tokens: Pivot-Aware Speculative Decoding

関連論文リスト