Fugu-MT 論文翻訳(概要): Mem-T: Densifying Rewards for Long-Horizon Memory Agents

論文の概要: Mem-T: Densifying Rewards for Long-Horizon Memory Agents

arxiv url: http://arxiv.org/abs/2601.23014v1
Date: Fri, 30 Jan 2026 14:23:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-02 18:28:15.496451
Title: Mem-T: Densifying Rewards for Long-Horizon Memory Agents
Title（参考訳）: Mem-T:長軸記憶剤のデンプニング・リワード
Authors: Yanwei Yue, Guibin Zhang, Boci Peng, Xuanbo Fan, Jiaxin Guo, Qiankun Li, Yan Zhang,
Abstract要約: 我々は,動的更新やストリーミング入力によるマルチターン検索を行うために,軽量な階層型メモリデータベースと対話する自律メモリエージェントMem-Tを紹介する。また,木誘導型強化学習フレームワークであるMoT-GRPOを提案する。
参考スコア（独自算出の注目度）: 23.19373149519922
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Memory agents, which depart from predefined memory-processing pipelines by endogenously managing the processing, storage, and retrieval of memories, have garnered increasing attention for their autonomy and adaptability. However, existing training paradigms remain constrained: agents often traverse long-horizon sequences of memory operations before receiving sparse and delayed rewards, which hinders truly end-to-end optimization of memory management policies. To address this limitation, we introduce Mem-T, an autonomous memory agent that interfaces with a lightweight hierarchical memory database to perform dynamic updates and multi-turn retrieval over streaming inputs. To effectively train long-horizon memory management capabilities, we further propose MoT-GRPO, a tree-guided reinforcement learning framework that transforms sparse terminal feedback into dense, step-wise supervision via memory operation tree backpropagation and hindsight credit assignment, thereby enabling the joint optimization of memory construction and retrieval. Extensive experiments demonstrate that Mem-T is (1) high-performing, surpassing frameworks such as A-Mem and Mem0 by up to $14.92\%$, and (2) economical, operating on a favorable accuracy-efficiency Pareto frontier and reducing inference tokens per query by $\sim24.45\%$ relative to GAM without sacrificing performance.
Abstract（参考訳）: メモリエージェントは、処理、記憶、記憶の検索を不均一に管理することで、事前に定義されたメモリ処理パイプラインから離れており、その自律性と適応性に対する関心が高まっている。しかし、既存のトレーニングパラダイムは制約が残っており、エージェントはスパースと遅延した報酬を受け取る前に、しばしばメモリ操作の長い水平シーケンスを横切るため、真のエンドツーエンドのメモリ管理ポリシーの最適化を妨げます。この制限に対処するため,我々はMem-Tを紹介した。Mem-Tは軽量な階層型メモリデータベースとインタフェースを持ち,動的更新とストリーミング入力によるマルチターン検索を行う。さらに,長期記憶管理能力を効果的に訓練するために,メモリ操作ツリーのバックプロパゲーションと後からのクレジット割り当てを通じて,スパース端末からのフィードバックを高密度で段階的に監視する木誘導強化学習フレームワークであるMoT-GRPOを提案し,メモリ構築と検索の協調最適化を可能にする。大規模な実験では、(1) A-MemやMem0といったフレームワークを最大14.92ドルまで上回り、(2)経済的に、好適な精度効率のParetoフロンティアで運用し、パフォーマンスを犠牲にすることなく、クエリあたりの推論トークンを$\sim24.45セントで削減している。

論文の概要: Mem-T: Densifying Rewards for Long-Horizon Memory Agents

関連論文リスト