Fugu-MT 論文翻訳(概要): Goose: Anisotropic Speculation Trees for Training-Free Speculative Decoding

論文の概要: Goose: Anisotropic Speculation Trees for Training-Free Speculative Decoding

arxiv url: http://arxiv.org/abs/2604.02047v1
Date: Thu, 02 Apr 2026 13:48:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-03 14:21:10.829327
Title: Goose: Anisotropic Speculation Trees for Training-Free Speculative Decoding
Title（参考訳）: Goose: トレーニング不要な投機復号のための異方性投機木
Authors: Tao Jin, Phuong Minh Nguyen, Naoya Inoue,
Abstract要約: 投機的復号化は、複数の候補トークンを起草し、1つのフォワードパスで検証することで、大きな言語モデル推論を加速させる。既存のトレーニングフリーなメソッドは、単一のトークンソースからドラフトされ、起源の候補品質を区別することなく、ツリーを形作っている。我々は、入力コンテキストからコピーされたn-gramマッチングと、先行パスからの統計的予測の2つの一般的なトレーニングフリートークンソースが、受入率において劇的に異なることを観察した。ステップ毎に受理されるトークンの数は、どちらのソースも単独で使用するものと同じくらいであることを示す。
参考スコア（独自算出の注目度）: 13.709230136542594
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Speculative decoding accelerates large language model inference by drafting multiple candidate tokens and verifying them in a single forward pass. Candidates are organized as a tree: deeper trees accept more tokens per step, but adding depth requires sacrificing breadth (fallback options) under a fixed verification budget. Existing training-free methods draft from a single token source and shape their trees without distinguishing candidate quality across origins. We observe that two common training-free token sources - n-gram matches copied from the input context, and statistical predictions from prior forward passes - differ dramatically in acceptance rate (~6x median gap, range 2-18x across five models and five benchmarks). We prove that when such a quality gap exists, the optimal tree is anisotropic (asymmetric): reliable tokens should form a deep chain while unreliable tokens spread as wide branches, breaking through the depth limit of balanced trees. We realize this structure in GOOSE, a training-free framework that builds an adaptive spine tree - a deep chain of high-acceptance context-matched tokens with wide branches of low-acceptance alternatives at each node. We prove that the number of tokens accepted per step is at least as large as that of either source used alone. On five LLMs (7B-33B) and five benchmarks, GOOSE achieves 1.9-4.3x lossless speedup, outperforming balanced-tree baselines by 12-33% under the same budget.
Abstract（参考訳）: 投機的復号化は、複数の候補トークンを起草し、1つのフォワードパスで検証することで、大きな言語モデル推論を加速させる。深い木はステップごとにより多くのトークンを受け入れますが、深さを追加するには、固定された検証予算の下で幅(フォールバックオプション)を犠牲にする必要があります。既存のトレーニングフリーなメソッドは、単一のトークンソースからドラフトされ、起源の候補品質を区別することなく、ツリーを形作っている。入力コンテキストからコピーしたn-gramマッチと先行パスからの統計的予測の2つの一般的なトレーニングフリートークンソースは、受入率で劇的に異なる(中央値の差が6倍、モデル5とベンチマーク5で2-18倍)。このような品質ギャップが存在する場合、最適木は異方性(非対称)であることが証明される: 信頼できるトークンは深い連鎖を形成し、信頼できないトークンは広い分岐として広がり、バランスの取れた木の深さ限界を突破する。この構造はGOOSEにおいて実現されている。これは適応型スピーンツリー(各ノードにローアクセプタンス代替品の広い枝を持つ高アクセプタンスコンテキスト整合トークンのディープチェーン)を構築するトレーニングフリーフレームワークである。ステップ毎に受理されるトークンの数は、どちらのソースも単独で使用するものと同じくらいであることを示す。 5つのLSM(7B-33B)と5つのベンチマークで、GOOSEは1.9-4.3倍の損失のないスピードアップを達成した。

論文の概要: Goose: Anisotropic Speculation Trees for Training-Free Speculative Decoding

関連論文リスト