Fugu-MT 論文翻訳(概要): Self-Distillation Enables Continual Learning

論文の概要: Self-Distillation Enables Continual Learning

arxiv url: http://arxiv.org/abs/2601.19897v1
Date: Tue, 27 Jan 2026 18:59:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-28 15:26:51.442687
Title: Self-Distillation Enables Continual Learning
Title（参考訳）: 自己蒸留は継続的な学習を可能にする
Authors: Idan Shenfeld, Mehul Damani, Jonas Hübotter, Pulkit Agrawal,
Abstract要約: SDFT (Self-Distillation Fine-Tuning) は、実証から直接政治学を学ぶことができる方法である。 SDFTは教師付き微調整を一貫して上回り、新しいタスクの精度を高める。逐次学習実験では、SDFTは1つのモデルでパフォーマンスの回帰なしに、時間とともに複数のスキルを蓄積することができる。
参考スコア（独自算出の注目度）: 12.996554934410412
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Continual learning, enabling models to acquire new skills and knowledge without degrading existing capabilities, remains a fundamental challenge for foundation models. While on-policy reinforcement learning can reduce forgetting, it requires explicit reward functions that are often unavailable. Learning from expert demonstrations, the primary alternative, is dominated by supervised fine-tuning (SFT), which is inherently off-policy. We introduce Self-Distillation Fine-Tuning (SDFT), a simple method that enables on-policy learning directly from demonstrations. SDFT leverages in-context learning by using a demonstration-conditioned model as its own teacher, generating on-policy training signals that preserve prior capabilities while acquiring new skills. Across skill learning and knowledge acquisition tasks, SDFT consistently outperforms SFT, achieving higher new-task accuracy while substantially reducing catastrophic forgetting. In sequential learning experiments, SDFT enables a single model to accumulate multiple skills over time without performance regression, establishing on-policy distillation as a practical path to continual learning from demonstrations.
Abstract（参考訳）: モデルが既存の能力を損なうことなく、新たなスキルと知識を習得することのできる継続的な学習は、基礎モデルの根本的な課題である。政治的強化学習は忘れることを減らすことができるが、しばしば利用できない明示的な報酬関数を必要とする。専門家によるデモンストレーションから学ぶことは、主要な代替手段であり、本質的には非政治的な教師付き微調整(SFT)によって支配されている。本稿では,実証から直接政治学を学べる簡易な方法である自己蒸留細管(SDFT)を紹介する。 SDFTは、実証条件付きモデルを独自の教師として利用し、新しいスキルを身につけながら、事前の能力を保ったオンライントレーニング信号を生成することで、コンテキスト内学習を活用する。スキル学習と知識獲得のタスク全体で、SDFTは一貫してSFTより優れており、破滅的な忘れを著しく減らしながら、新しいタスクの精度を高める。逐次学習実験では、SDFTはパフォーマンスの劣化を伴わずに、一度に複数のスキルを蓄積し、実証から継続的に学習するための実践的な方法として、オンライン蒸留を確立する。

論文の概要: Self-Distillation Enables Continual Learning

関連論文リスト