Fugu-MT 論文翻訳(概要): Beyond Static Dialogues: Benchmarking Realistic, Heterogeneous, and Evolving Long-Term Memory

論文の概要: Beyond Static Dialogues: Benchmarking Realistic, Heterogeneous, and Evolving Long-Term Memory

arxiv url: http://arxiv.org/abs/2605.31086v2
Date: Mon, 01 Jun 2026 09:00:11 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-02 14:56:41.449501
Title: Beyond Static Dialogues: Benchmarking Realistic, Heterogeneous, and Evolving Long-Term Memory
Title（参考訳）: 静的対話を超えて: リアルタイム、不均一、長期記憶のベンチマーク
Authors: Han Zhang, Zihao Tang, Xin Yu, Xiao Liu, Yeyun Gong, Haizhen Huang, Yan Lu, Weiwei Deng, Feng Sun, Qi Zhang, Hanfang Yang,
Abstract要約: 動的時間的進化と長期コヒーレンスを示す多様な相互作用シナリオにまたがる現実的な対話を導入する。その結果得られたベンチマークは、7つの調査タイプにまたがる挑戦的な質問と回答のペアを含んでいる。現在研究されている27の重要な記憶特性のうち、少なくとも1つを同定する。
参考スコア（独自算出の注目度）: 54.947805187562274
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In existing memory benchmarks for Large Language Models (LLMs), the evaluated dialogue sessions often lack long-term semantic consistency, and the underlying personas tend to be flat and static. Furthermore, in real-world scenarios, interactions between users and assistants involve more diverse, heterogeneous data streams, such as documents and emails. These shortcomings significantly limit the realism and effectiveness of current evaluations. To address these limitations, we introduce RHELM (Realistic, Heterogeneous, and Evolving Long-term Memory). Driven by meticulously crafted user profiles and a novel LOOP (pLan-rOllout-evOlve-Prune) module, we construct realistic dialogues across diverse interaction scenarios that exhibit dynamic temporal evolution and long-term coherence. Crucially, these dialogues are deeply integrated with heterogeneous external sources synchronized with the user's temporal event trajectory. The resulting benchmark encompasses challenging question-answer pairs spanning seven inquiry types, with each question mapping to at least one of 27 critical memory characteristics that we identify as essential yet underexplored in current research. Comprehensive experiments across full-context models, retrieval-augmented generation (RAG) methods, and representative memory frameworks reveal that contemporary approaches still expose critical weaknesses in complex, real-world settings, particularly in resolving multi-source aggregation and real-world contextual reasoning.
Abstract（参考訳）: LLM(Large Language Models)の既存のメモリベンチマークでは、評価された対話セッションは長期的なセマンティックな一貫性が欠如しており、基礎となるペルソナはフラットで静的であることが多い。さらに、現実のシナリオでは、ユーザとアシスタント間のインタラクションには、ドキュメントやEメールなど、より多様で異質なデータストリームが含まれる。これらの欠点は、現在の評価の現実性と有効性を著しく制限する。これらの制約に対処するため、RHELM(Realistic, Heterogeneous, Evolving Long-term Memory)を導入します。ユーザプロファイルを巧みに作成し,新しい LOOP (pLan-rOllout-evOlve-Prune) モジュールにより,動的時間的進化と長期コヒーレンスを示す多様な相互作用シナリオ間の現実的な対話を構築する。重要な点として、これらの対話は、ユーザの時間的事象軌跡と同期した異種外部ソースと深く統合されている。結果として得られたベンチマークは、7つの質問タイプにまたがる挑戦的な質問と回答のペアを含んでおり、各質問は、我々が現在調査されている重要で過小評価されていない27のメモリ特性のうちの少なくとも1つにマッピングされる。完全コンテキストモデル、検索拡張生成(RAG)メソッド、および代表記憶フレームワークにわたる包括的な実験により、現代的なアプローチは、複雑な実世界の設定において、特にマルチソースアグリゲーションと実世界のコンテキスト推論の解決において、依然として重大な弱点を露呈していることが明らかとなった。

論文の概要: Beyond Static Dialogues: Benchmarking Realistic, Heterogeneous, and Evolving Long-Term Memory

関連論文リスト