Fugu-MT 論文翻訳(概要): SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents

論文の概要: SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents

arxiv url: http://arxiv.org/abs/2606.05761v1
Date: Thu, 04 Jun 2026 06:43:11 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-05 22:39:44.602228
Title: SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents
Title（参考訳）: SubtleMemory: 長距離AIエージェントにおける微粒リレーショナルメモリ識別のためのベンチマーク
Authors: Wenxuan Wang, Haoyu Sun, Fukuan Hou, Mingyang Song, Weinan Zhang, Yu Cheng, Yang Yang,
Abstract要約: 本稿では,長期にわたるAIエージェントにおける微粒なリレーショナルメモリ識別のためのベンチマークであるSubtleMemoryを紹介する。我々は,6つのスタンドアロンメモリシステム,ネイティブメモリモジュールを持つ2つのClawスタイルエージェント,プラグインメモリモジュールを持つ3つのClawスタイルエージェントを評価した。
参考スコア（独自算出の注目度）: 38.778004697710855
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Persistent AI assistants, such as OpenClaw, accumulate large collections of related memories over long-term interactions. As these memories grow, they may reinforce one another, diverge across contexts, or directly conflict, making correct assistance depend on memory relations rather than isolated recall. Existing long-term memory benchmarks rarely probe how agents preserve and utilize such relations during downstream tasks. To address this gap, we introduce SubtleMemory, a benchmark for fine-grained relational memory discrimination in long-running AI agents. SubtleMemory constructs relation-controlled latent semantic artifacts whose variants instantiate complementary, nuanced, or contradictory relations, and embeds them into realistic user-agent histories, requiring agents to recover distributed relational structures during later queries and instructions. The benchmark contains 1,522 evaluation instances over 10 long histories, grounded in 1,090 relation-controlled memory-variant sets and spanning user-related and non-user-related queries. Evaluating six standalone memory systems, two Claw-style agents with native memory modules, and three Claw-style agents with plugin memory modules, we find that current systems remain weak on fine-grained relational memory discrimination. We further introduce diagnostic protocols that reveal distinct capability profiles across memory preservation, retrieval, and downstream reasoning stages.
Abstract（参考訳）: OpenClawのような永続的なAIアシスタントは、長期的なインタラクションを通じて、関連するメモリの大規模なコレクションを蓄積する。これらの記憶が成長するにつれて、互いに強化したり、コンテキストをまたいだり、直接衝突したりし、孤立した記憶というよりは記憶関係に依存する。既存の長期メモリベンチマークでは、下流タスクにおいてエージェントがどのようにそのような関係を保ち利用しているかを調査することは滅多にない。このギャップに対処するために、長時間動作するAIエージェントにおける微粒なリレーショナルメモリ識別のためのベンチマークであるSubtleMemoryを紹介した。 SubtleMemoryは、相補的、ニュアンス的、あるいは矛盾した関係をインスタンス化し、それらを現実的なユーザエージェントの履歴に埋め込んで、後続のクエリや命令の間、エージェントが分散リレーショナル構造を復元する必要がある関係制御の潜時的アーティファクトを構築する。ベンチマークには、10の長い履歴の上に1,522の評価インスタンスが含まれており、1,090のリレーショナルコントロールされたメモリ不変セットと、ユーザ関連および非ユーザ関連クエリにまたがる。 6つのスタンドアロンメモリシステム、ネイティブメモリモジュールを持つClawスタイルエージェント、2つのClawスタイルエージェント、プラグインメモリモジュールを持つClawスタイルエージェントを評価した結果、現在のシステムは詳細なリレーショナルメモリ識別に弱いままであることがわかった。さらに,記憶の保存,検索,下流推論の段階にわたる特徴プロファイルを明らかにするための診断プロトコルを導入する。

論文の概要: SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents

関連論文リスト