Fugu-MT 論文翻訳(概要): PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments

論文の概要: PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments

arxiv url: http://arxiv.org/abs/2603.23231v1
Date: Tue, 24 Mar 2026 14:04:11 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-25 19:53:37.520189
Title: PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments
Title（参考訳）: PERMA: イベント駆動推論とリアルタスク環境によるパーソナライズドメモリエージェントのベンチマーク
Authors: Shuochen Liu, Junyi Zhu, Long Shu, Junda Lin, Yuhao Chen, Haotian Zhang, Chao Zhang, Derong Xu, Jia Li, Bo Tang, Zhiyu Li, Feiyu Xiong, Enhong Chen, Tong Xu,
Abstract要約: 静的な嗜好リコールを超えてペルマの一貫性を評価するためのベンチマークであるPERMAを紹介する。 PerMAは、複数のセッションとドメインにまたがる時間的に順序付けられたインタラクションイベントと、時間とともに好みに関連するクエリで構成されている。実験により、関連するインタラクションをリンクすることで、高度なメモリシステムはより正確な好みを抽出し、トークン消費を減らすことができることが示された。
参考スコア（独自算出の注目度）: 72.02445514666428
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Empowering large language models with long-term memory is crucial for building agents that adapt to users' evolving needs. However, prior evaluations typically interleave preference-related dialogues with irrelevant conversations, reducing the task to needle-in-a-haystack retrieval while ignoring relationships between events that drive the evolution of user preferences. Such settings overlook a fundamental characteristic of real-world personalization: preferences emerge gradually and accumulate across interactions within noisy contexts. To bridge this gap, we introduce PERMA, a benchmark designed to evaluate persona consistency over time beyond static preference recall. Additionally, we incorporate (1) text variability and (2) linguistic alignment to simulate erratic user inputs and individual idiolects in real-world data. PERMA consists of temporally ordered interaction events spanning multiple sessions and domains, with preference-related queries inserted over time. We design both multiple-choice and interactive tasks to probe the model's understanding of persona along the interaction timeline. Experiments demonstrate that by linking related interactions, advanced memory systems can extract more precise preferences and reduce token consumption, outperforming traditional semantic retrieval of raw dialogues. Nevertheless, they still struggle to maintain a coherent persona across temporal depth and cross-domain interference, highlighting the need for more robust personalized memory management in agents. Our code and data are open-sourced at https://github.com/PolarisLiu1/PERMA.
Abstract（参考訳）: 長期記憶で大きな言語モデルを強化することは、ユーザの進化するニーズに適応するエージェントを構築するために不可欠である。しかし、事前評価では、通常、関係のない会話と嗜好関係の対話をインターリーブし、ユーザの嗜好の進化を促すイベント間の関係を無視しながら、ヘイスタック検索へのタスクを短縮する。このような設定は、現実のパーソナライゼーションの基本的な特徴を見落としている。このギャップを埋めるため、静的なリコールを超えてペルマの一貫性を評価するためのベンチマークであるPERMAを導入する。さらに,(1) テキストの多様性と(2) 言語的アライメントを組み込んで,実世界のデータにおける不規則なユーザ入力と個々のイディオレクトをシミュレートする。 PERMAは、複数のセッションとドメインにまたがる時間的に順序付けられたインタラクションイベントと、時間とともに好みに関連するクエリで構成されている。我々は、対話タイムラインに沿って、モデルのペルソナ理解を探索するために、複数選択タスクと対話タスクの両方を設計する。実験により、関連する相互作用をリンクすることで、高度なメモリシステムはより正確な嗜好を抽出し、トークン消費を減らすことができ、生の対話の伝統的な意味的検索よりも優れていることが示された。それでも彼らは、時間的深さとドメイン間の干渉を越えて一貫性のあるペルソナを維持するのに苦慮しており、エージェントのより堅牢なパーソナライズされたメモリ管理の必要性を強調している。私たちのコードとデータはhttps://github.com/PolarisLiu1/PERMA.comでオープンソース化されています。

論文の概要: PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments

関連論文リスト