Fugu-MT 論文翻訳(概要): Direct Multi-Token Decoding

論文の概要: Direct Multi-Token Decoding

arxiv url: http://arxiv.org/abs/2510.11958v1
Date: Mon, 13 Oct 2025 21:42:37 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-15 19:02:32.100923
Title: Direct Multi-Token Decoding
Title（参考訳）: 直接マルチトークンデコード
Authors: Xuan Luo, Weizhi Wang, Xifeng Yan,
Abstract要約: 我々は,大規模言語モデル(LLM)の推論パラダイムとして,DMTD(Direct Multi-Token Decoding)を導入する。投機的復号法とは異なり,提案手法では追加パラメータや補助ルーチンやポストジェネレーション検証は導入されない。微調整のDMTD Qwen3-4Bモデルはすでに有望な結果を示しており、2倍のスピードアップを実現している。
参考スコア（独自算出の注目度）: 24.347862297812977
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Decoder-only transformers have become the standard architecture for large language models (LLMs) due to their strong performance. Recent studies suggest that, in pre-trained LLMs, early, middle, and late layers may serve distinct roles: Early layers focus on understanding the input context, middle layers handle task-specific processing, and late layers convert abstract representations into output tokens. We hypothesize that once representations have been processed by the early and middle layers, the resulting hidden states may encapsulate sufficient information to support the generation of multiple tokens using only the late layers, eliminating the need to repeatedly traverse the early and middle layers. We refer to this inference paradigm as Direct Multi-Token Decoding (DMTD). Unlike speculative decoding, our method introduces no additional parameters, auxiliary routines, or post-generation verification. Despite being trained on a limited dataset, a fine-tuned DMTD Qwen3-4B model has already demonstrated promising results, achieving up to a 2x speedup with only minor performance loss. Moreover, as shown in our scaling analysis, its performance is expected to further improve with larger training datasets.
Abstract（参考訳）: デコーダのみのトランスフォーマーは、その性能が強いため、大規模言語モデル(LLM)の標準アーキテクチャとなっている。初期のレイヤは入力コンテキストを理解することに集中し、中間レイヤはタスク固有の処理を処理し、後期レイヤは抽象表現を出力トークンに変換する。我々は、一度表現がアーリー層とミドル層によって処理されたら、結果として隠れた状態は、後期層のみを使用して複数のトークンの生成をサポートするのに十分な情報をカプセル化し、アーリー層とミドル層を繰り返す必要をなくすことができると仮定する。この推論パラダイムをDMTD(Direct Multi-Token Decoding)と呼ぶ。投機的復号法とは異なり,提案手法では追加パラメータや補助ルーチンやポストジェネレーション検証は導入されない。限られたデータセットでトレーニングされているにもかかわらず、微調整のDMTD Qwen3-4Bモデルはすでに有望な結果を示しており、パフォーマンス損失がわずかである2倍のスピードアップを実現している。さらに、スケーリング分析で示されているように、そのパフォーマンスはより大きなトレーニングデータセットでさらに向上することが期待されている。

論文の概要: Direct Multi-Token Decoding

関連論文リスト