Fugu-MT 論文翻訳(概要): Acceptance Dynamics Across Cognitive Domains in Speculative Decoding

論文の概要: Acceptance Dynamics Across Cognitive Domains in Speculative Decoding

arxiv url: http://arxiv.org/abs/2604.14682v1
Date: Thu, 16 Apr 2026 06:38:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-17 21:29:31.759768
Title: Acceptance Dynamics Across Cognitive Domains in Speculative Decoding
Title（参考訳）: 投機復号における認知領域間の受容ダイナミクス
Authors: Saif Mahmoud,
Abstract要約: 本稿では,木に基づく投機的復号化力学の実証的研究を行う。本研究は4つのNLPベンチマークドメインにまたがる。タスクタイプは木の深さよりも受け入れの予測が強いことが分かりました。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Speculative decoding accelerates large language model (LLM) inference. It uses a small draft model to propose a tree of future tokens. A larger target model then verifies these tokens in a single batched forward pass. Despite the growing body of work on speculative methods, the degree to which the cognitive characteristics of a task affect acceptance probability remains largely unexplored. We present an empirical study of tree-based speculative decoding acceptance dynamics. Our study spans four well-established NLP benchmark domains: code generation, mathematical reasoning, logical reasoning, and open-ended chat. For this, we use TinyLlama-1.1B as the draft model against Llama-2-7B-Chat-GPTQ as the target. Over 99,768 speculative nodes collected from 200 prompts, we derive per-domain acceptance rates, expected accepted lengths, depth-acceptance profiles, and entropy-acceptance correlations. We find that task type is a stronger predictor of acceptance than tree depth. Furthermore, only the chat domain consistently yields an expected accepted length exceeding 1.0 token per step. We also show that the entropy-acceptance correlation is consistently negative but weak across all domains (rho in [-0.20, -0.15]). Counterintuitively, chat produces the highest entropy yet the highest acceptance rate. We attribute this divergence to the lexical predictability of RLHF-aligned register. These findings have direct implications for domain-aware speculation budgets and draft-model selection strategies. Index Terms--speculative decoding, large language model inference, tree attention, draft model, acceptance probability, LLM efficiency
Abstract（参考訳）: 投機的復号化は大規模言語モデル(LLM)推論を加速させる。将来のトークンのツリーの提案には、小さなドラフトモデルを使用する。より大きなターゲットモデルは、これらのトークンを単一のバッチフォワードパスで検証する。投機的手法に関する研究が増えているにもかかわらず、タスクの認知的特性が受容確率に影響を与える程度は、まだ明らかにされていない。本稿では,木に基づく投機的復号化力学の実証的研究を行う。我々の研究は、コード生成、数学的推論、論理推論、オープンエンドチャットの4つの確立されたNLPベンチマークドメインにまたがっている。このため、TinyLlama-1.1BをLlama-2-7B-Chat-GPTQに対するドラフトモデルとして使用する。 200のプロンプトから収集された99,768以上の投機ノードは、ドメイン単位の受入率、期待される受入長、深度受容プロファイル、エントロピー受容相関を導出する。タスクタイプは木の深さよりも受け入れの予測が強いことが分かりました。さらに、チャットドメインだけが1ステップ当たり1.0トークンを超える許容長を常に生成する。また、エントロピー・アクセプタンス相関は、すべての領域において一貫して負であるが弱である(rho in [-0.20, -0.15])。対極的には、チャットは最もエントロピーが高く、最も受け入れ率が高い。この分岐はRLHF整列レジスタの語彙的予測可能性に起因している。これらの結果は、ドメイン意識の投機予算やドラフトモデル選択戦略に直接影響する。索引項--投機的復号化、大言語モデル推論、ツリーアテンション、ドラフトモデル、受け入れ確率、LLM効率

論文の概要: Acceptance Dynamics Across Cognitive Domains in Speculative Decoding

関連論文リスト