Fugu-MT 論文翻訳(概要): Grow, Don't Overwrite: Fine-tuning Without Forgetting

論文の概要: Grow, Don't Overwrite: Fine-tuning Without Forgetting

arxiv url: http://arxiv.org/abs/2603.08647v1
Date: Mon, 09 Mar 2026 17:26:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:16.601936
Title: Grow, Don't Overwrite: Fine-tuning Without Forgetting
Title（参考訳）: 成長する、上書きしない:忘れずに微調整する
Authors: Dyah Adila, Hanna Mazzawi, Benoit Dherin, Xavier Gonzalvo,
Abstract要約: 訓練済みのモデルを専門的なタスクに適合させると、大惨事に陥ることが多い。既存の方法は、新しいタスクのパフォーマンスを損なうか、トレーニングの安定性と事前訓練された知識の効率的な再利用のバランスをとるのに苦労する。本稿では,このジレンマを解消する関数保存拡張法を提案する。
参考スコア（独自算出の注目度）: 8.51968450061047
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adapting pre-trained models to specialized tasks often leads to catastrophic forgetting, where new knowledge overwrites foundational capabilities. Existing methods either compromise performance on the new task or struggle to balance training stability with efficient reuse of pre-trained knowledge. We introduce a novel function-preserving expansion method that resolves this dilemma. Our technique expands model capacity by replicating pre-trained parameters within transformer submodules and applying a scaling correction that guarantees the expanded model is mathematically identical to the original at initialization, enabling stable training while exploiting existing knowledge. Empirically, our method eliminates the trade-off between plasticity and stability, matching the performance of full fine-tuning on downstream tasks without any degradation of the model's original capabilities. Furthermore, we demonstrate the modularity of our approach, showing that by selectively expanding a small subset of layers we can achieve the same performance as full fine-tuning at a fraction of the computational cost.
Abstract（参考訳）: 訓練済みのモデルを専門的なタスクに適用すると、しばしば破滅的な忘れがちになり、そこでは新たな知識が基礎的能力を上書きする。既存の方法は、新しいタスクのパフォーマンスを損なうか、トレーニングの安定性と事前訓練された知識の効率的な再利用のバランスをとるのに苦労する。本稿では,このジレンマを解消する関数保存拡張法を提案する。本手法は, 変圧器サブモジュール内の事前学習パラメータを複製し, 拡張モデルが初期化時と数学的に同一であることを保証したスケーリング補正を適用し, 既存の知識を活用しながら, 安定した訓練を可能にすることによって, モデル容量を拡大する。実験により,本手法は可塑性と安定性のトレードオフを解消し,モデル本来の性能を損なうことなく,下流タスクの完全微調整性能を満足する。さらに,本手法のモジュラリティを実証し,少数の層を選択的に拡張することにより,計算コストのごく一部で完全な微調整を実現できることを示した。

論文の概要: Grow, Don't Overwrite: Fine-tuning Without Forgetting

関連論文リスト