Fugu-MT 論文翻訳(概要): Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents

論文の概要: Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents

arxiv url: http://arxiv.org/abs/2509.23040v1
Date: Sat, 27 Sep 2025 01:36:46 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:18.999085
Title: Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents
Title（参考訳）: Reason Forwardを振り返って - 長期LLMエージェントのリビジタブルメモリ
Authors: Yaorui Shi, Yuxin Chen, Siyuan Wang, Sihang Li, Hengxing Cai, Qi Gu, Xiang Wang, An Zhang,
Abstract要約: 本稿では、メモリ履歴全体からの選択的検索を可能にするコールバック強化メモリを備えたメモリ拡張エージェントReMemR1を提案する。また,RLMLR(Reinforcement Learning with Multi-Level Rewards)を提案する。
参考スコア（独自算出の注目度）: 33.617262543252494
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models face challenges in long-context question answering, where key evidence of a query may be dispersed across millions of tokens. Existing works equip large language models with a memory corpus that is dynamically updated during a single-pass document scan, also known as the "memorize while reading" methods. While this approach scales efficiently, it suffers from irreversible forward-only processing, information loss through overwriting, and sparse reinforcement learning signals. To tackle these challenges, we present ReMemR1, a memory-augmented agent with callback-enhanced memory that allows selective retrieval from the entire memory history and allows non-linear reasoning and revisiting of early evidence. To further strengthen training, we propose Reinforcement Learning with Multi-Level Rewards (RLMLR), which combines final-answer rewards with dense, step-level signals that guide effective memory use. Together, these contributions mitigate information degradation, improve supervision, and support multi-hop memory utilizing. Experiments on long-document QA show significant gains over existing memory-based approaches, which validates ReMemR1 as an effective solution for long-context reasoning agents.
Abstract（参考訳）: 大規模言語モデルは、クエリの重要な証拠が数百万のトークンに分散される可能性のある、長期コンテキストの質問応答において、課題に直面している。既存の作業では、シングルパスドキュメントスキャン中に動的に更新されるメモリコーパスを、大きな言語モデルに装備している。このアプローチは効率的にスケールするが、非可逆なフォワードオンリーの処理、上書きによる情報損失、疎密な強化学習信号に悩まされる。これらの課題に対処するために,コールバック強化メモリを備えたメモリ拡張エージェントであるReMemR1を提案する。トレーニングをさらに強化するために,複数レベルリワードを用いた強化学習(Reinforcement Learning with Multi-Level Rewards, RLMLR)を提案する。これらのコントリビューションは、情報劣化を軽減し、監督を改善し、マルチホップメモリの利用をサポートする。長期文書QAの実験は、ReMemR1を長文推論エージェントの効果的なソリューションとして検証し、既存のメモリベースのアプローチよりも大幅に向上した。

論文の概要: Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents

関連論文リスト