Fugu-MT 論文翻訳(概要): Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories

論文の概要: Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories

arxiv url: http://arxiv.org/abs/2606.03979v1
Date: Tue, 02 Jun 2026 17:56:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-03 22:00:05.241761
Title: Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories
Title（参考訳）: 睡眠が必要な言語モデル: 記憶の自己修正と統合を学ぶ
Authors: Ali Behrouz, Farnoosh Hashemi, Vahab Mirrokni,
Abstract要約: 我々は、モデルが短期的な脆弱な記憶をリプレイで安定した長期的知識に継続的に学習し、蒸留することを可能にする'Sleep'パラダイムを導入する。人間の学習プロセスに触発されて、我々は、知識探索のための新しい一般化蒸留プロセス(すなわち、オンライン蒸留と強化学習の組み合わせ)を提示する。長期学習, 継続学習, 知識の取り込み, および数発の一般化タスクに関する実験は, 睡眠ステージの重要性を裏付けるものである。
参考スコア（独自算出の注目度）: 43.8851217839697
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The past few decades have witnessed significant advances in the design of machine learning algorithms, from early studies on task-specific shallow models to more general deep Large Language Models (LLMs). Despite showing promising results in tasks that require instant prediction or in-context learning, existing models lack the ability to continually learn and effectively transfer their temporal in-context knowledge to their long-term parameters. Inspired by human learning process, we introduce a ''Sleep'' paradigm that allows the models to continually learn, distill their short-term fragile memories into stable long-term knowledge with replay, and recursively improve themselves with ''Dreaming'' process. In more detail, sleep consists of two stages: (1) Memory Consolidation: an upward distillation process, called Knowledge Seeding, where the memories of a smaller-self are distilled into a larger network to provide more capacity while preserving the knowledge. As a proof of concept, we present a new Generalized Distillation process for {Knowledge Seeding} (i.e., the combination of on-policy distillation with Reinforcement Learning (RL)-based imitation learning); (2) Dreaming: a self-improvement phase, where the model uses RL to generate a curriculum of synthetic data to rehearse new knowledge and refine existing capabilities without human supervision. Our experiments on long-horizon, continual learning, knowledge incorporation, and few-shot generalization tasks support the importance of the sleep stage.
Abstract（参考訳）: 過去数十年間、タスク固有の浅いモデルに関する初期の研究からより汎用的な大規模言語モデル(LLM)まで、機械学習アルゴリズムの設計において大きな進歩が見られた。インスタント予測やコンテキスト内学習を必要とするタスクにおいて有望な結果を示す一方で、既存のモデルは、時間的文脈内知識を長期的パラメータに継続的に学習し、効果的に転送する能力が欠如している。人間の学習プロセスにインスパイアされた「Sleep」パラダイムを導入し、モデルが継続的に学習し、短時間の脆弱な記憶をリプレイで安定した長期的知識に蒸留し、"Dreaming"プロセスで再帰的に改善する。より詳しくは、睡眠は2つの段階から構成される: (1) 記憶の強化: 知識シーディングと呼ばれる上向きの蒸留プロセスで、より小さな自分自身の記憶をより大きなネットワークに蒸留し、知識を維持しながらより多くの能力を提供する。概念実証として,<knowledge Seeding} の一般蒸留プロセス(すなわち,オンライン蒸留と強化学習(Reinforcement Learning, RL)に基づく模倣学習の組み合わせ)を新たに提案する。(2)ドリーミング:自己改善段階において,モデルがRLを用いて合成データのカリキュラムを生成し,人間の監督なしに新たな知識をリハーサルし,既存の能力を洗練させる。長期学習, 継続学習, 知識の取り込み, および数発の一般化タスクに関する実験は, 睡眠ステージの重要性を裏付けるものである。

論文の概要: Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories

関連論文リスト