Fugu-MT 論文翻訳(概要): Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding

論文の概要: Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding

arxiv url: http://arxiv.org/abs/2509.24328v1
Date: Mon, 29 Sep 2025 06:25:54 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.782861
Title: Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding
Title（参考訳）: 投機的検証:投機的復号化のための爆発的情報ゲイン
Authors: Sungkyun Kim, Jaemin Kim, Dogyung Yoon, Jiho Shin, Junyeol Lee, Jiwon Seo,
Abstract要約: 投機的検証は投機精度を動的に予測し、検証長を適用してスループットを最大化する。 SD性能を最大2$times$まで改善し、大容量設定では平均1.4$times$になる。
参考スコア（独自算出の注目度）: 8.36763119650407
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: LLMs have low GPU efficiency and high latency due to autoregressive decoding. Speculative decoding (SD) mitigates this using a small draft model to speculatively generate multiple tokens, which are then verified in parallel by a target model. However, when speculation accuracy is low, the overhead from rejected tokens can offset the benefits, limiting SD's effectiveness, especially at large batch sizes. To address this, we propose Speculative Verification (SV), an efficient augmentation to SD that dynamically predicts speculation accuracy and adapts the verification length to maximize throughput. SV introduces a companion model - a small auxiliary model similar in size to the draft model - to estimate the alignment between draft and target model distributions. By maximizing the information gain from quantifying this alignment, SV refines verification decisions, reducing wasted computation on rejected tokens and improving decoding efficiency. Moreover, SV requires no modifications to the draft or target models and is compatible with existing SD variants. We extensively evaluated SV on publicly available LLMs across three NLP tasks using nine combinations of draft, companion, and target models, including 13B-72B target models and three types of variations: base (no finetuning), instruction-tuned, and task fine-tuned. Across all experiments and batch sizes (4-80), SV consistently outperforms both SD and standard decoding with the target model. It improves SD performance by up to 2$\times$, with an average speedup of 1.4 $\times$ in large-batch settings (batch sizes 32-80). These results demonstrate SV's robustness, scalability, and practical utility for efficient LLM inference.
Abstract（参考訳）: LLMは、自己回帰復号化によるGPU効率の低下とレイテンシの低下がある。投機的復号(SD)は、小さなドラフトモデルを用いてこれを緩和し、複数のトークンを投機的に生成し、ターゲットモデルによって並列に検証する。しかし、推測精度が低い場合には、拒否されたトークンからのオーバーヘッドが利点を相殺し、特に大きなバッチサイズにおいてSDの有効性を制限できる。そこで本研究では,投機精度を動的に予測し,検証長を最適化してスループットを最大化する,SDの効率的な拡張であるSpeculative Verification(SV)を提案する。 SVは、ドラフトモデルと同じようなサイズの小さな補助モデルであるコンパニオンモデルを導入し、ドラフトモデルとターゲットモデルの分布のアライメントを推定する。このアライメントの定量化による情報ゲインの最大化により、SVは検証決定を洗練し、拒否されたトークンの無駄な計算を減らし、復号効率を向上させる。さらに、SVはドラフトモデルやターゲットモデルの変更を一切必要とせず、既存のSD版と互換性がある。 13B-72Bターゲットモデルと,ベース(微調整なし),命令調整,タスク細調整の3種類のモデルを含む,9種類のドラフトモデル,コンパニオンモデル,ターゲットモデルの組み合わせを用いて,3つのNLPタスクで利用可能なLLM上でのSVを広範囲に評価した。全ての実験とバッチサイズ(4-80)において、SVはターゲットモデルでSDと標準デコードの両方を一貫して上回っている。 SD性能を最大2$\times$まで改善し、大容量設定では平均1.4$\times$(バッチサイズ32-80)となる。これらの結果から, SVの堅牢性, 拡張性, 実用的なLCM推論の有用性が示された。

論文の概要: Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding

関連論文リスト