Fugu-MT 論文翻訳(概要): Learning, Fast and Slow: Towards LLMs That Adapt Continually

論文の概要: Learning, Fast and Slow: Towards LLMs That Adapt Continually

arxiv url: http://arxiv.org/abs/2605.12484v2
Date: Thu, 14 May 2026 17:49:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-15 15:19:49.902471
Title: Learning, Fast and Slow: Towards LLMs That Adapt Continually
Title（参考訳）: 学び、速く、ゆっくりと - 継続的に適応するLLMを目指して
Authors: Rishabh Tiwari, Kusha Sareen, Lakshya A Agrawal, Joseph E. Gonzalez, Matei Zaharia, Kurt Keutzer, Inderjit S Dhillon, Rishabh Agarwal, Devvrit Khatri,
Abstract要約: 大規模言語モデル(LLM)は、パラメータを更新することで下流タスクのために訓練される。高速スロートレーニング(FST)は、推論タスクのスローラーニング(RL)よりも最大3倍のサンプリング効率を持つ。
参考スコア（独自算出の注目度）: 83.9214051113102
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are trained for downstream tasks by updating their parameters (e.g., via RL). However, updating parameters forces them to absorb task-specific information, which can result in catastrophic forgetting and loss of plasticity. In contrast, in-context learning with fixed LLM parameters can cheaply and rapidly adapt to task-specific requirements (e.g., prompt optimization), but cannot by itself typically match the performance gains available through updating LLM parameters. There is no good reason for restricting learning to being in-context or in-weights. Moreover, humans also likely learn at different time scales (e.g., System 1 vs 2). To this end, we introduce a fast-slow learning framework for LLMs, with model parameters as "slow" weights and optimized context as "fast" weights. These fast "weights" can learn from textual feedback to absorb the task-specific information, while allowing slow weights to stay closer to the base model and persist general reasoning behaviors. Fast-Slow Training (FST) is up to 3x more sample-efficient than only slow learning (RL) across reasoning tasks, while consistently reaching a higher performance asymptote. Moreover, FST-trained models remain closer to the base LLM (up to 70% less KL divergence), resulting in less catastrophic forgetting than RL-training. This reduced drift also preserves plasticity: after training on one task, FST trained models adapt more effectively to a subsequent task than parameter-only trained models. In continual learning scenarios, where task domains change on the fly, FST continues to acquire each new task while parameter-only RL stalls.
Abstract（参考訳）: 大規模言語モデル(LLM)は、パラメータ(例えばRLを介して)を更新することで、下流タスクのために訓練される。しかし、パラメータを更新することでタスク固有の情報を吸収し、破滅的な忘れ物や可塑性の喪失を引き起こす可能性がある。対照的に、LLMパラメータを固定したコンテキスト内学習は、タスク固有の要件(例えば、迅速な最適化)に安価かつ迅速に適応することができるが、LLMパラメータを更新することで得られるパフォーマンス向上と一致しない。学習を文脈内やウェイトに制限する正当な理由はない。さらに、人間は異なる時間スケール(例えば、システム1対2)で学習する可能性が高い。この目的のために,モデルパラメータを"slow"重みとして,コンテキストを"fast"重みとして最適化したLLMの高速スロー学習フレームワークを導入する。これらの高速な"ウェイト"はテキストフィードバックから学び、タスク固有の情報を吸収すると同時に、遅いウェイトがベースモデルに近づき、一般的な推論動作を継続することを可能にする。高速スロートレーニング(FST)は、推論タスクをまたいだスローラーニング(RL)よりも最大3倍のサンプリング効率を持ち、継続的に高いパフォーマンスの漸近に達する。さらに、FST訓練モデルがベースLLMに近づき(最大70%のKLの発散)、RL訓練よりも破滅的な忘れ込みが少なくなる。 1つのタスクでトレーニングした後、FST訓練されたモデルは、パラメータのみのトレーニングされたモデルよりも、その後のタスクにより効果的に適応する。タスクドメインがオンザフライで変化する連続的な学習シナリオでは、パラメータのみのRLが停止している間、FSTは各新しいタスクを取得し続ける。

論文の概要: Learning, Fast and Slow: Towards LLMs That Adapt Continually

関連論文リスト