Fugu-MT 論文翻訳(概要): Draft-Thinking: Learning Efficient Reasoning in Long Chain-of-Thought LLMs

論文の概要: Draft-Thinking: Learning Efficient Reasoning in Long Chain-of-Thought LLMs

arxiv url: http://arxiv.org/abs/2603.00578v1
Date: Sat, 28 Feb 2026 09:57:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-03 19:50:56.273019
Title: Draft-Thinking: Learning Efficient Reasoning in Long Chain-of-Thought LLMs
Title（参考訳）: ドリフトシンキング:長鎖LLMにおける学習効率のよい推論
Authors: Jie Cao, Tianwei Lin, Zhenxuan Fan, Bo Yuan, Ziyuan Zhao, Rolan Yan, Wenqiao Zhang, Siliang Tang,
Abstract要約: 我々は,まずモデルに,重要な推論ステップのみを保持する簡潔なテキストドラフト型推論構造を学習するよう誘導するtextbfDraft-Thinkingを提案する。実験により、ドラフトシンキングは推論性能を保ちながら推論予算を大幅に削減することが示された。
参考スコア（独自算出の注目度）: 46.272771457924186
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Long chain-of-thought~(CoT) has become a dominant paradigm for enhancing the reasoning capability of large reasoning models~(LRMs); however, the performance gains often come with a substantial increase in reasoning budget. Recent studies show that existing CoT paradigms tend to induce systematic overthinking, unnecessarily coupling reasoning capability with reasoning cost. Most prior approaches reduce token usage through post hoc techniques such as token compression, truncation, or length penalties, without explicitly addressing the core mechanisms of reasoning. We propose \textbf{Draft-Thinking}, which guides models to first learn a concise \textit{draft-style} reasoning structure that retains only the critical reasoning steps. Through a \textit{progressive curriculum learning}, the model stably internalizes this efficient reasoning pattern as its capability scales. Moreover, Draft-Thinking introduces adaptive prompting, which elevates reasoning depth to a flexible, model-selectable behavior. Extensive experiments demonstrate that Draft-Thinking substantially reduces reasoning budget while largely preserving reasoning performance; for example, on MATH500, it achieves an 82.6\% reduction in reasoning budget at the cost of only a 2.6\% performance drop.
Abstract（参考訳）: ロングチェーン〜(CoT)は、大きな推論モデル~(LRM)の推論能力を高める主要なパラダイムとなっているが、性能向上は、しばしば推論予算を大幅に増加させる。近年の研究では、既存のCoTパラダイムは体系的に過度に考え直し、必然的に推論能力と推論コストを結合する傾向があることが示されている。従来のアプローチでは、トークン圧縮やトランケーション、長さのペナルティといったポストホックな手法によって、推論のコアメカニズムに明示的に対処することなくトークンの使用を減らしていた。そこで本研究では,まずモデルに重要な推論ステップのみを保持する簡潔な推論構造を学習するよう誘導する「textbf{Draft-Thinking}」を提案する。 textit{progressive curriculum learning} を通じて、モデルは、その能力がスケールするにつれて、この効率的な推論パターンを安定して内部化する。さらに、Draft-Thinkingは適応的なプロンプトを導入し、推論の深さを柔軟でモデル選択可能な振る舞いに高める。大規模な実験では、ドラフトシンキングは推論予算を大幅に削減し、推論性能を保ち、例えばMATH500では、推論予算をわずか2.6倍のコストで82.6倍の削減を実現している。

論文の概要: Draft-Thinking: Learning Efficient Reasoning in Long Chain-of-Thought LLMs

関連論文リスト