Fugu-MT 論文翻訳(概要): Reasoning Steps as Curriculum: Using Depth of Thought as a Difficulty Signal for Tuning LLMs

論文の概要: Reasoning Steps as Curriculum: Using Depth of Thought as a Difficulty Signal for Tuning LLMs

arxiv url: http://arxiv.org/abs/2508.18279v1
Date: Wed, 13 Aug 2025 11:31:28 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-31 21:54:20.607436
Title: Reasoning Steps as Curriculum: Using Depth of Thought as a Difficulty Signal for Tuning LLMs
Title（参考訳）: 思考深度をLLMのチューニング困難信号として用いた推論ステップのカリキュラム化
Authors: Jeesu Jung, Sangkeun Jung,
Abstract要約: 我々は,難易度を思考深度(DoT)として定義し,教師モデルの推論トレースにおける個別ステップを数えて運用する。次に、このDoTで命令された浅いから深いカリキュラムでトレーニングを行い、それを大規模に導出し、検証し、スケジュールする方法について概説します。
参考スコア（独自算出の注目度）: 5.8153681798663
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Curriculum learning for training LLMs requires a difficulty signal that aligns with reasoning while remaining scalable and interpretable. We propose a simple premise: tasks that demand deeper depth of thought for humans should also be harder for models. Accordingly, we define difficulty as depth of thought (DoT) and operationalize it by counting the discrete steps in a teacher model's reasoning trace (e.g., Chain-of-Thought). We then train with a shallow to deep curriculum ordered by this DoT and outline how to derive, validate, and schedule it at scale. Our position yields three testable hypotheses: (i) DoT correlates with conventional difficulty on reasoning benchmarks, (ii) DoT-ordered curricula outperform length- or judge-scored curricula under matched budgets, and (iii) the difficulty is robust across teacher models given light formatting controls. We propose an evaluation framework and discuss threats to validity (teacher style, length confounds) alongside practical mitigations. Taken together, we aim to move toward cognitively grounded, interpretable curricula for reasoning-centric training.
Abstract（参考訳）: LLMをトレーニングするためのカリキュラム学習には、スケーラブルで解釈可能なままの推論と整合する困難信号が必要である。モデルでは,人間の思考深度を求めるタスクも困難である。そこで我々は,難易度を思考深度(DoT)として定義し,教師モデルの推論トレース(例えばChain-of-Thought)における個別ステップを数えて運用する。次に、このDoTで命令された浅いから深いカリキュラムでトレーニングを行い、それを大規模に導出し、検証し、スケジュールする方法について概説します。私たちの立場は3つの証明可能な仮説を導き出す。 i)DoTは従来の推論ベンチマークの難しさと相関する。二予算の整合により、DoTの順序付きキュリキュラの長さ又は判定付きキュリキュラを上回るもの三ライトフォーマッティング制御を施した教師モデルにおいて、難易度は堅牢である。評価枠組みを提案し,実践的緩和とともに妥当性(教師のスタイル, 長さの相違)に対する脅威を議論する。本研究は,思考中心の学習のために,認知的基盤を持つ解釈可能なカリキュラムへと進むことを目的としている。

論文の概要: Reasoning Steps as Curriculum: Using Depth of Thought as a Difficulty Signal for Tuning LLMs

関連論文リスト