Fugu-MT 論文翻訳(概要): Student-in-the-Loop Chain-of-Thought Distillation via Generation-Time Selection

論文の概要: Student-in-the-Loop Chain-of-Thought Distillation via Generation-Time Selection

arxiv url: http://arxiv.org/abs/2604.02819v1
Date: Fri, 03 Apr 2026 07:35:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-06 17:20:24.375536
Title: Student-in-the-Loop Chain-of-Thought Distillation via Generation-Time Selection
Title（参考訳）: 世代選択による学生の「最強の連鎖」蒸留
Authors: Chaoqun He, Yingfa Chen, Chaojun Xiao, Xu Han, Lijie Wen,
Abstract要約: Gen-SSD(Generation-time Self-Selection Distillation)は、世代選択を行う学生向けフレームワークである。数学的推論ベンチマークの実験は、Gen-SSDが標準知識蒸留よりも一貫して優れていることを示した。
参考スコア（独自算出の注目度）: 15.121168355895444
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large reasoning models achieve strong performance on complex tasks through long chain-of-thought (CoT) trajectories, but directly transferring such reasoning processes to smaller models remains challenging. A key difficulty is that not all teacher-generated reasoning trajectories are suitable for student learning. Existing approaches typically rely on post-hoc filtering, selecting trajectories after full generation based on heuristic criteria. However, such methods cannot control the generation process itself and may still produce reasoning paths that lie outside the student's learning capacity. To address this limitation, we propose Gen-SSD (Generation-time Self-Selection Distillation), a student-in-the-loop framework that performs generation-time selection. Instead of passively consuming complete trajectories, the student evaluates candidate continuations during the teacher's sampling process, guiding the expansion of only learnable reasoning paths and enabling early pruning of unhelpful branches. Experiments on mathematical reasoning benchmarks demonstrate that Gen-SSD consistently outperforms standard knowledge distillation and recent baselines, with improvements of around 5.9 points over Standard KD and up to 4.7 points over other baselines. Further analysis shows that Gen-SSD produces more stable and learnable reasoning trajectories, highlighting the importance of incorporating supervision during generation for effective distillation.
Abstract（参考訳）: 大規模推論モデルは、長いチェーン・オブ・ソート(CoT)軌道を介して複雑なタスクにおいて強い性能を達成するが、そのような推論プロセスをより小さなモデルに直接転送することは困難である。鍵となる困難は、教師が生み出す推論の軌跡が、生徒の学習に適しているわけではないことである。既存のアプローチは一般的にポストホックフィルタリングに依存し、ヒューリスティックな基準に基づいて完全な生成後の軌跡を選択する。しかし、このような手法は生成過程自体を制御できず、学生の学習能力の外にある推論経路を生成する可能性がある。この制限に対処するために、生成時選択を行う学生向けフレームワークであるGen-SSD(Generation-time Self-Selection Distillation)を提案する。受動的に完全な軌跡を消費する代わりに、学生は教師のサンプリング過程における候補者の継続を評価し、学習可能な推論経路のみの拡張を誘導し、不必要な枝の早期刈り取りを可能にする。数学的推論ベンチマークの実験では、Gen-SSDは標準知識蒸留と最近のベースラインを一貫して上回り、標準KDよりも約5.9ポイント、他のベースラインより最大4.7ポイント向上している。さらなる分析により、Gen-SSDはより安定で学習可能な推論軌道を生産し、効率的な蒸留のための生成時に監督を取り入れることの重要性を強調している。

論文の概要: Student-in-the-Loop Chain-of-Thought Distillation via Generation-Time Selection

関連論文リスト