Fugu-MT 論文翻訳(概要): Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

論文の概要: Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

arxiv url: http://arxiv.org/abs/2508.19828v1
Date: Wed, 27 Aug 2025 12:26:55 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-28 19:07:41.623628
Title: Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning
Title（参考訳）: Memory-R1: 大規模言語モデルエージェントの強化学習によるメモリ管理と利用
Authors: Sikuan Yan, Xiufeng Yang, Zuchao Huang, Ercong Nie, Zifeng Ding, Zonggen Li, Xiaowen Ma, Hinrich Schütze, Volker Tresp, Yunpu Ma,
Abstract要約: 大規模言語モデル(LLM)は、幅広いNLPタスクで印象的な機能を示しているが、基本的にはステートレスである。本稿では,LLMに外部メモリを積極的に管理・活用する機能を備えた強化学習フレームワークであるMemory-R1を提案する。 152組の質問応答対と、トレーニング用の時間記憶バンクで、Memory-R1は最も競争力のある既存のベースラインを上回っている。
参考スコア（独自算出の注目度）: 59.16831804985279
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of NLP tasks, but they remain fundamentally stateless, constrained by limited context windows that hinder long-horizon reasoning. Recent efforts to address this limitation often augment LLMs with an external memory bank, yet most existing pipelines are static and heuristic-driven, lacking any learned mechanism for deciding what to store, update, or retrieve. We present Memory-R1, a reinforcement learning (RL) framework that equips LLMs with the ability to actively manage and utilize external memory through two specialized agents: a Memory Manager that learns to perform structured memory operations {ADD, UPDATE, DELETE, NOOP}, and an Answer Agent that selects the most relevant entries and reasons over them to produce an answer. Both agents are fine-tuned with outcome-driven RL (PPO and GRPO), enabling adaptive memory management and use with minimal supervision. With as few as 152 question-answer pairs and a corresponding temporal memory bank for training, Memory-R1 outperforms the most competitive existing baseline and demonstrates strong generalization across diverse question types and LLM backbones. Beyond presenting an effective approach, this work provides insights into how RL can unlock more agentic, memory-aware behaviors in LLMs, pointing toward richer, more persistent reasoning systems.
Abstract（参考訳）: 大規模言語モデル(LLM)は、幅広いNLPタスクにおいて印象的な機能を示しているが、それらは基本的にステートレスであり、長い水平推論を妨げる限られたコンテキストウィンドウによって制約されている。しかし、既存のパイプラインは静的でヒューリスティックで、何を保存するか、更新するか、取得するかを決めるための学習メカニズムが欠如している。メモリマネージャは、構造化メモリ操作を学習する{ADD, UPDATE, DELETE, NOOP} と、それらに対して最も関連性の高い項目と理由を選択するアンサーエージェントである。どちらのエージェントも結果駆動型RL(PPOとGRPO)で微調整され、適応型メモリ管理と最小限の監視で使用することができる。 152組の質問応答対とトレーニング用の時間記憶バンクで、Memory-R1は最も競争力のある既存のベースラインを上回り、多様な質問タイプとLLMバックボーンをまたいだ強力な一般化を示している。効果的なアプローチの提示以外にも、この研究は、よりリッチで永続的な推論システムを指して、RLがLLMのよりエージェント的でメモリ対応の振る舞いを解き放つ方法に関する洞察を提供する。

論文の概要: Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

関連論文リスト