Fugu-MT 論文翻訳(概要): DIVERSED: Relaxed Speculative Decoding via Dynamic Ensemble Verification

論文の概要: DIVERSED: Relaxed Speculative Decoding via Dynamic Ensemble Verification

arxiv url: http://arxiv.org/abs/2604.07622v1
Date: Wed, 08 Apr 2026 21:52:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-10 18:34:05.579087
Title: DIVERSED: Relaxed Speculative Decoding via Dynamic Ensemble Verification
Title（参考訳）: DIVERSED:動的アンサンブル検証による推論復号化
Authors: Ziyi Wang, Siva Rajesh Kasa, Ankith M S, Santhosh Kumar Kasa, Jiaru Zou, Sumit Negi, Ruqi Zhang, Nan Jiang, Qifan Song,
Abstract要約: 投機的復号化は,複数のトークンを並列に起草することで,大規模言語モデルの推論を高速化する有効な手法である。生成品質を保ちながら時間効率を向上させる緩やかな検証フレームワークである動的検証緩和投機復号法(DIVERSED)を提案する。本研究では,DIVERSEDが標準的な投機的復号法に比べてかなり高い推論効率を実現することを示す。
参考スコア（独自算出の注目度）: 29.426184837710952
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Speculative decoding is an effective technique for accelerating large language model inference by drafting multiple tokens in parallel. In practice, its speedup is often bottlenecked by a rigid verification step that strictly enforces the accepted token distribution to exactly match the target model. This constraint leads to the rejection of many plausible tokens, lowering the acceptance rate and limiting overall time speedup. To overcome this limitation, we propose Dynamic Verification Relaxed Speculative Decoding (DIVERSED), a relaxed verification framework that improves time efficiency while preserving generation quality. DIVERSED learns an ensemble-based verifier that blends the draft and target model distributions with a task-dependent and context-dependent weight. We provide theoretical justification for our approach and demonstrate empirically that DIVERSED achieves substantially higher inference efficiency compared to standard speculative decoding methods. Code is available at: https://github.com/comeusr/diversed.
Abstract（参考訳）: 投機的復号化は,複数のトークンを並列に起草することで,大規模言語モデルの推論を高速化する有効な手法である。実際には、そのスピードアップは、ターゲットモデルと正確に一致するように、受け入れられたトークン分布を厳格に強制する厳格な検証ステップによってボトルネックになることが多い。この制約は多くの可算トークンを拒絶し、受け入れ率を下げ、全体的なタイムスピードアップを制限する。この制限を克服するために、生成品質を保ちながら時間効率を向上させる緩和された検証フレームワークである動的検証緩和投機復号法(DIVERSED)を提案する。 DIVERSEDは、草案とターゲットモデルの分布とタスク依存およびコンテキスト依存の重みをブレンドするアンサンブルベースの検証器を学習する。提案手法の理論的正当性を示すとともに,DIVERSEDが標準的な投機的復号法に比べてかなり高い推論効率を達成できることを実証的に示す。コードは、https://github.com/comeusr/diversed.comで入手できる。

論文の概要: DIVERSED: Relaxed Speculative Decoding via Dynamic Ensemble Verification

関連論文リスト