Fugu-MT 論文翻訳(概要): Fast Thinking for Large Language Models

論文の概要: Fast Thinking for Large Language Models

arxiv url: http://arxiv.org/abs/2509.23633v1
Date: Sun, 28 Sep 2025 04:19:48 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.340763
Title: Fast Thinking for Large Language Models
Title（参考訳）: 大規模言語モデルのための高速思考
Authors: Haoyu Zheng, Zhuonan Wang, Yuqian Yuan, Tianwei Lin, Wenqiao Zhang, Zheqi Lv, Juncheng Li, Siliang Tang, Yueting Zhuang, Hongyang He,
Abstract要約: 我々は、訓練中にのみ簡潔なCoTスケッチを使用して個別戦略事前のコードブックを学習するフレームワークであるLatent Codebooks for Fast Thinkingを紹介した。推論では、コードブックから抽出した少数の連続的思考スイッチのモデル条件を1パスにすることで、明確な推論トークンを生成することなく、戦略レベルのガイダンスを可能にする。
参考スコア（独自算出の注目度）: 67.7238685892317
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reasoning-oriented Large Language Models (LLMs) often rely on generating explicit tokens step by step, and their effectiveness typically hinges on large-scale supervised fine-tuning or reinforcement learning. While Chain-of-Thought (CoT) techniques substantially enhance performance on complex reasoning tasks, they remain inefficient, requiring long reasoning traces that increase latency and token usage. In this work, we introduce Latent Codebooks for Fast Thinking, a framework that uses concise CoT sketches only during training to learn a codebook of discrete strategy priors. At inference, the model conditions on a handful of continuous thinking vectors distilled from the codebook in a single pass, enabling strategy-level guidance without producing explicit reasoning tokens. To complement this design, we propose GainRouter, a lightweight routing mechanism that adaptively switches between fast codebook guided inference and slow explicit reasoning, thereby suppressing overthinking and reducing unnecessary token generation. Experiments across multiple reasoning benchmarks show that our approach achieves competitive or superior accuracy while substantially lowering inference cost, offering a practical path toward efficient and controllable reasoning in large language models.
Abstract（参考訳）: 推論指向の大規模言語モデル(LLM)は、しばしばステップごとに明示的なトークンを生成することに依存し、その効果は通常、大規模に監督された微調整や強化学習に依存する。 CoT(Chain-of-Thought)技術は複雑な推論タスクのパフォーマンスを大幅に向上させるが、それらは非効率であり、レイテンシとトークン使用量を増加させる長い推論トレースを必要とする。本研究では,訓練中にのみ簡潔なCoTスケッチを使用して,個別戦略事前のコードブックを学習するフレームワークであるLatent Codebooks for Fast Thinkingを紹介する。推論では、コードブックから1回のパスで抽出した少数の連続的思考ベクトルのモデル条件により、明確な推論トークンを生成することなく、戦略レベルのガイダンスを可能にする。この設計を補完するために、高速なコードブックガイド推論と遅い明示的推論を適応的に切り替える軽量ルーティング機構であるGainRouterを提案する。複数の推論ベンチマークを用いて実験した結果,提案手法は推論コストを大幅に低減しつつ,競合的あるいは優れた精度を実現し,大規模言語モデルにおける効率的かつ制御可能な推論への実践的な道筋を提供する。

論文の概要: Fast Thinking for Large Language Models

関連論文リスト