Fugu-MT 論文翻訳(概要): MemConflict: Evaluating Long-Term Memory Systems Under Memory Conflicts

論文の概要: MemConflict: Evaluating Long-Term Memory Systems Under Memory Conflicts

arxiv url: http://arxiv.org/abs/2605.20926v1
Date: Wed, 20 May 2026 09:11:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.591802
Title: MemConflict: Evaluating Long-Term Memory Systems Under Memory Conflicts
Title（参考訳）: MemConflict: 長期記憶システムの評価
Authors: Zhen Tao, Jinxiang Zhao, Peng Liu, Dinghao Xi, Yanfang Chen, Wei Xu, Zhiyu Li,
Abstract要約: 本稿では,メモリの妥当性をクエリ条件付きフィットネス・フォー・ユース問題として扱う診断フレームワークを提案する。 MemConflictは、時間的妥当性、事実的正確性、文脈的適用性に関する動的、静的、条件的衝突を形式化する。構造化されたユーザプロファイルから制御されたロングホライズン履歴をシミュレートし、クロスセッションコンフリクトを導入し、セマンティックに類似したイントラクタを注入して、メモリ候補間の競合を発生させる。
参考スコア（独自算出の注目度）: 19.9199366981741
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Long-term memory systems enable conversational agents based on large language models (LLMs) to retain, retrieve, and apply user-specific information across multi-session interactions. However, existing evaluations mainly assess outcome-level performance or temporal updating, providing limited insight into how systems retrieve and rank temporally valid, factually correct, and contextually applicable memory evidence under conflicting alternatives. To address this gap, we propose MemConflict, a diagnostic framework that treats memory validity as a query-conditioned fitness-for-use problem. MemConflict formalizes dynamic, static, and conditional conflicts over temporal validity, factual correctness, and contextual applicability. It simulates controlled long-horizon histories from structured user profiles, introduces cross-session conflicts, and injects semantically similar distractors to create competition among memory candidates. The resulting multi-session dialogue benchmark supports black-box evaluation of final answers and white-box analysis of supporting-memory retrieval and ranking. Experiments on six representative long-term memory systems show uneven strengths across conflict types, with answer correctness often diverging from memory retrieval and ranking. Sensitivity analyses reveal that longer histories, distractors, implicit queries, and larger conflict distances degrade performance. Diagnostics show failures from missing supporting memories and ineffective use of retrieved memories. Collectively, MemConflict advances principled long-term memory governance through retrieval-aware, conflict-aware reliability assessment.
Abstract（参考訳）: 長期記憶システムにより、大きな言語モデル(LLM)に基づく会話エージェントは、マルチセッションインタラクションを通じてユーザ固有の情報を保持、取得、適用することができる。しかし、既存の評価は、主に結果レベルのパフォーマンスや時間的更新を評価し、競合する代替手段の下での時間的有効性、事実的正当性、文脈的に適用可能なメモリエビデンスをシステムがどのように検索し、ランク付けするかについての限られた洞察を提供する。このギャップに対処するために,メモリの妥当性をクエリ条件付きフィットネス・フォー・ユース問題として扱う診断フレームワークであるMemConflictを提案する。 MemConflictは、時間的妥当性、事実的正確性、文脈的適用性に関する動的、静的、条件的衝突を形式化する。構造化されたユーザプロファイルから制御されたロングホライズン履歴をシミュレートし、クロスセッションコンフリクトを導入し、セマンティックに類似したイントラクタを注入して、メモリ候補間の競合を発生させる。得られたマルチセッション対話ベンチマークは、最終回答のブラックボックス評価と、サポートメモリ検索とランキングのホワイトボックス分析をサポートする。 6つの代表的な長期記憶システムの実験は、競合タイプ間で不均一な強度を示し、答えの正しさは記憶検索やランキングから分岐することが多い。感度分析により、長い履歴、注意散らし、暗黙のクエリ、より大きな衝突距離が性能を低下させることが明らかになった。診断では、不足した記憶の欠如と、回復した記憶の非有効利用が示される。集合的に、MemConflictは、検索対応、競合対応の信頼性評価を通じて、長期記憶管理の原則を推進している。

論文の概要: MemConflict: Evaluating Long-Term Memory Systems Under Memory Conflicts

関連論文リスト