Fugu-MT 論文翻訳(概要): Hindsight is 20/20: Building Agent Memory that Retains, Recalls, and Reflects

論文の概要: Hindsight is 20/20: Building Agent Memory that Retains, Recalls, and Reflects

arxiv url: http://arxiv.org/abs/2512.12818v1
Date: Sun, 14 Dec 2025 19:47:23 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-16 17:54:56.456858
Title: Hindsight is 20/20: Building Agent Memory that Retains, Recalls, and Reflects
Title（参考訳）: Hindsight: 20/20: 保持、リコール、リフレクションを行うエージェントメモリの構築
Authors: Chris Latimer, Nicoló Boschi, Andrew Neeser, Chris Bartholomew, Gaurav Srivastava, Xuan Wang, Naren Ramakrishnan,
Abstract要約: エージェントメモリを推論のための構造化第1級基板として扱うメモリアーキテクチャであるHindsightを提案する。情報の追加、アクセス、更新の方法を管理する3つのコア操作 – 保持、リコール、リフレクション – をサポートしている。オープンソースの20Bモデルは、全文ベースラインで全体の精度を39%から83.6%に引き上げる。
参考スコア（独自算出の注目度）: 11.300084544174894
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Agent memory has been touted as a dimension of growth for LLM-based applications, enabling agents that can accumulate experience, adapt across sessions, and move beyond single-shot question answering. The current generation of agent memory systems treats memory as an external layer that extracts salient snippets from conversations, stores them in vector or graph-based stores, and retrieves top-k items into the prompt of an otherwise stateless model. While these systems improve personalization and context carry-over, they still blur the line between evidence and inference, struggle to organize information over long horizons, and offer limited support for agents that must explain their reasoning. We present Hindsight, a memory architecture that treats agent memory as a structured, first-class substrate for reasoning by organizing it into four logical networks that distinguish world facts, agent experiences, synthesized entity summaries, and evolving beliefs. This framework supports three core operations -- retain, recall, and reflect -- that govern how information is added, accessed, and updated. Under this abstraction, a temporal, entity aware memory layer incrementally turns conversational streams into a structured, queryable memory bank, while a reflection layer reasons over this bank to produce answers and to update information in a traceable way. On key long-horizon conversational memory benchmarks like LongMemEval and LoCoMo, Hindsight with an open-source 20B model lifts overall accuracy from 39% to 83.6% over a full-context baseline with the same backbone and outperforms full context GPT-4o. Scaling the backbone further pushes Hindsight to 91.4% on LongMemEval and up to 89.61% on LoCoMo (vs. 75.78% for the strongest prior open system), consistently outperforming existing memory architectures on multi-session and open-domain questions.
Abstract（参考訳）: エージェントメモリは、LCMベースのアプリケーションの成長の次元として評価され、経験を蓄積し、セッションに適応し、単発の質問応答を超えて移動するエージェントを可能にする。現在のエージェントメモリシステムは、メモリを外部レイヤとして扱い、会話から健全なスニペットを抽出し、ベクターまたはグラフベースのストアに保存し、トップkアイテムを他のステートレスモデルのプロンプトに検索する。これらのシステムはパーソナライズとコンテキストの受け渡しを改善するが、証拠と推論の境界を曖昧にし、長い地平線を越えて情報を整理するのに苦労し、彼らの推論を説明する必要があるエージェントに限定的な支援を提供する。我々は,エージェントメモリを,世界事実,エージェント体験,合成エンティティ要約,進化的信念を区別する4つの論理ネットワークに編成することで,推論のための構造化された第1級基板として扱うメモリアーキテクチャであるHendsightを提案する。このフレームワークは、情報の追加、アクセス、更新の方法を管理する3つのコア操作(保持、リコール、リフレクション)をサポートしている。この抽象化の下では、時間的、エンティティを意識したメモリ層が、会話ストリームを構造化されたクエリ可能なメモリバンクに徐々に変換する一方、リフレクション層は、このバンク上で回答を生成し、情報をトレース可能な方法で更新する。 LongMemEvalやLoCoMoのような主要な長期会話メモリベンチマークでは、オープンソースの20BモデルでHendsightが全体の精度を39%から83.6%に上げる。バックボーンのスケーリングにより、HindsightはLongMemEvalでは91.4%、LoCoMoでは89.61%(最強のオープンシステムでは75.78%)となり、マルチセッションやオープンドメインの質問では既存のメモリアーキテクチャを一貫して上回っている。

論文の概要: Hindsight is 20/20: Building Agent Memory that Retains, Recalls, and Reflects

関連論文リスト