Fugu-MT 論文翻訳(概要): Training-free Dropout Sampling for Semantic Token Acceptance in Speculative Decoding

論文の概要: Training-free Dropout Sampling for Semantic Token Acceptance in Speculative Decoding

arxiv url: http://arxiv.org/abs/2603.03333v1
Date: Wed, 11 Feb 2026 04:53:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-09 01:20:08.16146
Title: Training-free Dropout Sampling for Semantic Token Acceptance in Speculative Decoding
Title（参考訳）: 投機的復号における意味的トークンアクセプタンスのための無訓練ドロップアウトサンプリング
Authors: Jeongtae Lee, Minjung Jo, Hyunjoon Jeong, Gunho Park, Sunghyeon Woo, Joonghoon Kim, Se Jung Kwon, Dongsoo Lee,
Abstract要約: 投機的復号化は、トークンを軽量なドラフトモデルで提案することにより、大きな言語モデル推論を加速する。この研究は、ドラフトトークンとターゲットモデルの予測分布をマッチングする新しいアプローチであるDropMatchを導入している。複数のベンチマークで実験したところ、我々のアプローチは、競争力のあるタスク性能を維持しながら、受け入れ期間を延ばすことが示されている。
参考スコア（独自算出の注目度）: 13.249778063956917
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Speculative decoding accelerates large language model inference by proposing tokens with a lightweight draft model and selectively accepting them using a target model. This work introduces DropMatch, a novel approach that matches draft tokens to the predictive distribution of the target model via Monte Carlo dropout applied exclusively to the LM head, enabling sampling-based acceptance decisions. By generating multiple decoding paths, our method forms an empirical token distribution against which draft tokens are evaluated for consistency. This acceptance mechanism enables the model to adaptively control the size of decoding paths under an appropriate dropout probability, preventing substantial distortion of the target model predictive distribution. The proposed method operates in a training-free, data-free, and calibration-free manner, requires no architectural modification to pretrained models, and can be orthogonally integrated with a wide range of existing speculative decoding and inference acceleration techniques. Experiments across multiple benchmarks demonstrate that our approach increases acceptance length while maintaining competitive task performance, yielding inference speedups ranging from 1.09x to 1.33x over the standard baseline, and up to an additional 1.09x speedup when applied on top of EAGLE3.
Abstract（参考訳）: 投機的復号化は、トークンを軽量なドラフトモデルで提案し、ターゲットモデルを用いて選択的に受け入れることで、大きな言語モデル推論を加速させる。この研究はDropMatchを導入している。これは、ドラフトトークンと、LMヘッドにのみ適用されるモンテカルロドロップアウトによるターゲットモデルの予測分布とをマッチングし、サンプリングベースの受け入れ決定を可能にする新しいアプローチである。複数の復号経路を生成することにより,提案手法は,整合性の評価を行うための経験的トークン分布を形成する。この受け入れ機構により、モデルが適切なドロップアウト確率でデコードパスのサイズを適応的に制御することができ、ターゲットモデル予測分布の実質的な歪みを防止できる。提案手法は,トレーニングフリー,データフリー,キャリブレーションフリーの方法で動作し,事前訓練されたモデルにアーキテクチャ的な変更を加えることなく,既存の投機的復号化および推論促進技術と直交的に統合することができる。複数のベンチマークで実験したところ、我々の手法は、競争力のあるタスク性能を維持しながら受け入れ期間を延長し、標準ベースライン上で1.09xから1.33xまでの推論スピードアップを達成し、EAGLE3上に適用すると1.09xのスピードアップを加算することがわかった。

論文の概要: Training-free Dropout Sampling for Semantic Token Acceptance in Speculative Decoding

関連論文リスト