Fugu-MT 論文翻訳(概要): Routing without Forgetting

論文の概要: Routing without Forgetting

arxiv url: http://arxiv.org/abs/2603.09576v1
Date: Tue, 10 Mar 2026 12:23:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-11 15:25:24.299101
Title: Routing without Forgetting
Title（参考訳）: 忘れずにルーティングする
Authors: Alessio Masano, Giovanni Bellitto, Dipam Goswani, Joost Van de Weijer, Concetto Spampinato,
Abstract要約: 変換器における連続学習をルーティング問題として再放送する。タスク固有のプロンプトを格納またはマージする代わりに、RwFはシングルステップの連想検索を通じて動的プロンプトを生成する。 Split-ImageNet-R と Split-ImageNet-S では、RwF は数ショットの学習システムであっても、事前のプロンプトベースのアプローチよりも大きなマージンでパフォーマンスを向上している。
参考スコア（独自算出の注目度）: 20.60324059904291
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Continual learning in transformers is commonly addressed through parameter-efficient adaptation: prompts, adapters, or LoRA modules are specialized per task while the backbone remains frozen. Although effective in controlled multi-epoch settings, these approaches rely on gradual gradient-based specialization and struggle in Online Continual Learning (OCL), where data arrive as a non-stationary stream and each sample may be observed only once. We recast continual learning in transformers as a routing problem: under strict online constraints, the model must dynamically select the appropriate representational subspace for each input without explicit task identifiers or repeated optimization. We thus introduce Routing without Forgetting (RwF), a transformer architecture augmented with energy-based associative retrieval layers inspired by Modern Hopfield Networks. Instead of storing or merging task-specific prompts, RwF generates dynamic prompts through single-step associative retrieval over the transformer token embeddings at each layer. Retrieval corresponds to the closed-form minimization of a strictly convex free-energy functional, enabling input-conditioned routing within each forward pass, independently of iterative gradient refinement. Across challenging class-incremental benchmarks, RwF improves over existing prompt-based methods. On Split-ImageNet-R and Split-ImageNet-S, RwF outperforms prior prompt-based approaches by a large margin, even in few-shot learning regimes. These results indicate that embedding energy-based associative routing directly within the transformer backbone provides a principled and effective foundation for OCL.
Abstract（参考訳）: トランスの継続的な学習は、パラメータ効率の適応によって一般的に対処される:プロンプト、アダプタ、あるいはLoRAモジュールはタスクごとに特殊化され、バックボーンは凍結されている。制御されたマルチエポックな設定では有効であるが、これらの手法は段階的な勾配に基づく特殊化とオンライン連続学習(OCL)の苦労に依存しており、データは静止しないストリームとして到着し、各サンプルは一度だけ観察できる。厳密なオンライン制約の下では、明示的なタスク識別子や繰り返し最適化なしに、各入力に対する適切な表現部分空間を動的に選択する必要がある。そこで我々は,現代ホップフィールドネットワークにインスパイアされたエネルギーベースの連想検索層を付加したトランスフォーマーアーキテクチャであるRuting without Forgetting (RwF)を導入する。タスク固有のプロンプトを格納またはマージする代わりに、RwFはトランスフォーマートークンを各レイヤに埋め込んだ単一ステップの連想検索を通じて動的プロンプトを生成する。検索は厳密な凸自由エネルギー関数の閉形式最小化に対応し、反復的な勾配改善とは無関係に、各前方パス内の入力条件付きルーティングを可能にする。挑戦的なクラスインクリメンタルベンチマークを通じて、RwFは既存のプロンプトベースのメソッドを改善している。 Split-ImageNet-R と Split-ImageNet-S では、RwF は数ショットの学習システムであっても、事前のプロンプトベースのアプローチよりも大きなマージンでパフォーマンスを向上している。これらの結果は、トランスバックボーンに直接エネルギーベースの連想ルーティングを埋め込むことが、OCLの原則的かつ効果的な基礎となることを示唆している。

論文の概要: Routing without Forgetting

関連論文リスト