Fugu-MT 論文翻訳(概要): SAM: State-Adaptive Memory for Long-Horizon Reasoning Agent

論文の概要: SAM: State-Adaptive Memory for Long-Horizon Reasoning Agent

arxiv url: http://arxiv.org/abs/2605.24468v1
Date: Sat, 23 May 2026 08:37:16 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:18.073224
Title: SAM: State-Adaptive Memory for Long-Horizon Reasoning Agent
Title（参考訳）: SAM:ロングホライゾン推論エージェントのステート適応メモリ
Authors: Yuyang Hu, Hongjin Qian, Shuting Wang, Jiongnan Liu, Ziliang Zhao, Jiejun Tan, Zheng Liu, Zhicheng Dou,
Abstract要約: ロングホライゾンのエージェント推論は、思考、ツールコール、観察、部分的な結論を含む長い相互作用履歴を通して行動するために大きな言語モデルを必要とする。既存のアプローチでは、インタラクション履歴の切り抜き、短いサロゲートに圧縮、あるいは再利用のために選択した部分を取得することで、この問題に対処している。我々は、インテント駆動リコールのための生のトラジェクトリページを保持しながら、継続的なインタラクションをコンパクトなメモリキューに統合するスタンドアロンフレームワークであるState-Adaptive Memoryを提案する。
参考スコア（独自算出の注目度）: 51.274445160155864
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Long-horizon agentic reasoning requires large language models to act over long interaction histories containing thoughts, tool calls, observations, and partial conclusions. The challenge is not merely that these histories grow long, but that information needed for the current decision may be scattered across distant steps and only become relevant later. Existing approaches address this difficulty by truncating the interaction history, compressing it into shorter surrogates, or retrieving selected parts of it for reuse, but they do not explicitly model how access to past interaction should adapt to the agent's evolving state. We instead cast long-horizon reasoning as a problem of state-adaptive memory. To this end, we propose State-Adaptive Memory~(SAM), a standalone framework that consolidates ongoing interaction into compact memory cues while preserving raw trajectory pages for intent-driven recall. These cues are not treated as replacements for history; rather, they serve as lightweight handles that allow the agent to reconstruct temporally distant information according to its current needs, without retraining the underlying backbone. We further optimize the memory module through expert-guided supervision and reinforcement learning, aligning it with trajectory-level utility. Across BrowseComp, BrowseComp-ZH, WideSearch, and HLE, SAM consistently outperforms strong baselines over diverse agent backbones. Our results suggest that explicit memory modeling provides a simple and effective foundation for long-horizon agentic reasoning.
Abstract（参考訳）: ロングホライゾンのエージェント推論は、思考、ツールコール、観察、部分的な結論を含む長い相互作用履歴を通して行動するために大きな言語モデルを必要とする。課題は、これらの歴史が長く成長するだけでなく、現在の決定に必要な情報は、遠くのステップに散らばって、後になって関係が深まる可能性があることである。既存のアプローチでは、インタラクション履歴を切断し、短いサロゲートに圧縮したり、再利用のために選択した部分を取得することでこの問題に対処するが、過去のインタラクションへのアクセスがエージェントの進化した状態にどのように適応すべきかを明示的にモデル化していない。代わりに、状態適応メモリの問題として、長い水平推論を行った。この目的のために、我々は、インテント駆動リコールのための生のトラジェクトリページを保持しながら、継続的な相互作用をコンパクトなメモリキューに統合するスタンドアロンフレームワークであるState-Adaptive Memory~(SAM)を提案する。これらのキューは歴史の代替として扱われるのではなく、エージェントが現在のニーズに応じて時間的に離れた情報を再構築する軽量ハンドルとして機能する。さらに、専門家による指導と強化学習を通じてメモリモジュールを最適化し、トラジェクトリレベルのユーティリティと整合させる。 BrowseComp、BrowseComp-ZH、WideSearch、HLEの他、SAMはさまざまなエージェントバックボーンに対して、強いベースラインを一貫して上回っている。以上の結果から,明示的メモリモデリングは長期的エージェント推論の簡便かつ効果的な基礎となることが示唆された。

論文の概要: SAM: State-Adaptive Memory for Long-Horizon Reasoning Agent

関連論文リスト