Fugu-MT 論文翻訳(概要): Distilling Reasoning into Student LLMs: Local Naturalness for Selecting Teacher Data

論文の概要: Distilling Reasoning into Student LLMs: Local Naturalness for Selecting Teacher Data

arxiv url: http://arxiv.org/abs/2510.03988v1
Date: Sun, 05 Oct 2025 01:15:32 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-07 16:52:59.374215
Title: Distilling Reasoning into Student LLMs: Local Naturalness for Selecting Teacher Data
Title（参考訳）: 学生LLMへの蒸留推論:教師データ選択のための局所自然性
Authors: Hoang Anh Just, Myeongseob Ko, Ruoxi Jia,
Abstract要約: そこで本研究では,局所自然度(Local Naturalness)について紹介する。多くの教師の回答を混ぜ合わせると、Local Naturalnessは32Bの学生の数学のベンチマークの精度を、グローバルセレクションよりも9.4pp向上させる。これらの結果は, ローカライズされたデータ品質評価とデータ混合により, より効果的に蒸留できることを示すものである。
参考スコア（独自算出の注目度）: 18.97748910748554
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Distilling long reasoning traces (10K+ tokens) from stronger teacher models into smaller student LLMs via SFT has emerged as a standard paradigm. This approach is practical and efficient: it leverages the ease of generating abundant reasoning data from stronger models and provides a direct, data-driven way to teach less capable models better reasoning. While previous work has largely focused on prompt selection with responses from a single teacher, the equally important problem of choosing the best response when multiple teacher outputs are available for a single prompt remains underexplored. This challenge becomes important in a multi-teacher setting, where different students may benefit from the outputs of different teachers. This paper fills that gap with a systematic study of response selection for reasoning distillation. We first show that the current method, which picks responses the student assigns the highest global log-probability (global naturalness), fails when responses come from multiple teachers, i.e., global naturalness no longer correlates with downstream performance, especially as the reasoning traces from strong teachers become longer. To overcome this problem, we introduce Local Naturalness, which measures the student's log-probabilities over short, sequential reasoning steps conditioned only on a small local window. Local Naturalness enables two applications: 1) Teacher Selection: Aggregating local scores across prompts reliably identifies the most helpful teacher. 2) Response Selection from a Multiple Teachers: When mixing answers from many teachers, Local Naturalness boosts a 32B student's accuracy on math benchmarks by 9.4pp over global selection, also surpassing the performance achieved by training on data from the single best teacher. These results highlight the power of localized data quality evaluation and data mixing for more effective reasoning distillation.
Abstract（参考訳）: より強力な教師モデルから SFT を通じて小さな学生 LLM への長い推論トレース (10K+トークン) の蒸留が標準パラダイムとして登場した。より強力なモデルから豊富な推論データを生成することの容易さを活用し、より能力の低いモデルにより良い推論を教える直接的なデータ駆動の方法を提供する。これまでの研究は、一人の教師からの応答による迅速な選択に主に焦点を当ててきたが、複数の教師のアウトプットが1つのプロンプトで利用可能になったときの最良の応答を選択するという、同じくらい重要な問題は、まだ未解決のままである。この課題は、異なる生徒が異なる教師のアウトプットから恩恵を受けることができるマルチ教師の設定において重要である。本稿では, 蒸留における反応選択の系統的研究により, そのギャップを埋める。まず、学生が最もグローバルな対数確率(グローバルな自然性)を割り当てる応答を選択する現在の方法が、複数の教師からの応答が得られなかった場合、すなわち、グローバルな自然性は下流のパフォーマンスと相関しなくなり、特に強い教師からの推論が長くなるにつれて失敗することを示す。そこで本研究では,局所自然性(Local Naturalness)について紹介する。これは,小さなローカルウィンドウにのみコンディションされた短いシーケンシャルな推論ステップよりも,学生のログ確率を計測するものだ。局所自然性は2つの応用を可能にする。 1)教師選択:プロンプト間の局所的なスコアの集約は,最も有用な教師を確実に特定する。 2) 複数教師からの回答選択: 多くの教師の回答を混ぜ合わせると, 局所自然度は, 算数ベンチマークにおける32Bの学生の精度を, グローバルセレクションよりも9.4pp向上させる。これらの結果は, ローカライズされたデータ品質評価とデータ混合により, より効果的に蒸留できることを示すものである。

論文の概要: Distilling Reasoning into Student LLMs: Local Naturalness for Selecting Teacher Data

関連論文リスト