Fugu-MT 論文翻訳(概要): SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting

論文の概要: SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting

arxiv url: http://arxiv.org/abs/2605.07243v1
Date: Fri, 08 May 2026 04:59:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-11 19:43:38.807263
Title: SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting
Title（参考訳）: SpecBlock: 動的ツリー描画によるブロックIterative Speculative Decoding
Authors: Weijie Shi, Qiang Xu, Fan Deng, Yaguang Wu, Jiarun Liu, Yehong Xu, Hao Chen, Jia Zhu, Jiajie Xu, Xiangjun Huang, Jian Yang, Xiaofang Zhou,
Abstract要約: SpecBlockはブロックイテレーティブなドラフトアで、パス依存と安価なドラフトを組み合わせています。デプロイ時のコストを意識した盗聴は、無償の検証者フィードバックを使用して、ドラフトを選択的に更新する。
参考スコア（独自算出の注目度）: 25.273024580844346
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Speculative decoding accelerates LLM inference by drafting a tree of candidate continuations and verifying it in one target forward. Existing drafters fall into two camps with opposite weaknesses. Autoregressive drafters such as EAGLE-3 preserve dependence along each draft path but call the drafter once per tree depth, making drafting a non-trivial share of per-iteration latency. Parallel drafters cut drafter calls by predicting multiple future positions in one forward, but each position is predicted without seeing the others, producing paths the verifier rejects. In this paper, we propose SpecBlock, a block-iterative drafter that combines path dependence with cheap drafting. Each drafter forward produces K dependent positions and we call this a block. The draft tree grows through repeated block expansions. Two mechanisms explicitly carry path dependence to keep later draft positions accurate. Within each block, a layer-wise shift carries the previous position's hidden state into every decoder layer. Across blocks, each new block can start from any position of the previous block, inheriting its hidden state to extend the path. To spend verifier budget where acceptance is likely, a co-trained rank head replaces the fixed top-k tree by allocating per-position branching during drafting. To avoid training the drafter on prefixes it never produces at inference, a valid-prefix mask drops the loss at later positions once an earlier one is wrong. Beyond static drafting, a cost-aware bandit at deployment uses free verifier feedback to update the drafter selectively, only when the expected throughput gain exceeds the update cost. Experiments show that SpecBlock improves mean speedup by 8-13% over EAGLE-3 at 44-52% of its drafting cost, and cost-aware adaptation extends this lead to 11-19%.
Abstract（参考訳）: 投機的復号化は、候補継続のツリーを起草し、それを1つのターゲットフォワードで検証することにより、LLM推論を加速させる。既存のドラフトリーダーは、反対の弱点を持つ2つのキャンプに落ちます。 EAGLE-3のような自動回帰型ドラフトラは、各ドラフトパスに依存性を保持するが、ツリー深度毎に1回ドラフトラを呼び出すことで、イテレーション毎のレイテンシの非自明な共有を実現している。並列草案作成者は1つの前方で複数の将来の位置を予測することでドラフト呼び出しをカットするが、各位置は他を見ることなく予測され、検証者が拒否する経路を生成する。本稿では,パス依存と安価なドラフトを組み合わせたブロックイテレーティブなドラフトラであるSpecBlockを提案する。各ドラフト作成者はK依存位置を前方に生成し、これをブロックと呼ぶ。ドラフトツリーは、繰り返しブロック拡張によって成長します。 2つのメカニズムは、後続のドラフト位置を正確に維持するために、パス依存を明示的に保持する。各ブロック内において、レイヤワイドシフトは、前の位置の隠された状態をデコーダの各層に運ぶ。ブロック全体で、各新しいブロックは以前のブロックの任意の位置から開始することができ、パスを拡張するためにその隠された状態を継承する。受入可能性の高いバリデーション予算に当り、起草中の位置ごとの分岐を割り当てることで、共同訓練されたランクヘッドが固定されたトップkツリーを置き換える。推論時に生成しないプレッサーのトレーニングを避けるために、有効なプリフィックスマスクは、前のプレフィックスが間違っていれば、後続の位置で損失を減少させる。静的なドラフトの他に、デプロイ時のコスト対応の盗聴は、無償の検証者フィードバックを使用して、期待されるスループット向上が更新コストを超える場合にのみ、ドラフトを選択的に更新する。実験の結果、SpecBlockは起草コストの44～52%でEAGLE-3よりも平均速度を8～13%向上し、コスト認識による適応は11～19%向上した。

論文の概要: SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting

関連論文リスト