Related papers: OP-Bench: Benchmarking Over-Personalization for Memory-Augmented Personalized Conversational Agents

OP-Bench: Benchmarking Over-Personalization for Memory-Augmented Personalized Conversational Agents

URL: http://arxiv.org/abs/2601.13722v1
Date: Tue, 20 Jan 2026 08:27:13 GMT
Title: OP-Bench: Benchmarking Over-Personalization for Memory-Augmented Personalized Conversational Agents
Authors: Yulin Hu, Zimo Long, Jiahe Guo, Xingyu Sui, Xing Fu, Weixiang Zhao, Yanyan Zhao, Bing Qin,
Abstract summary: We formalize over-personalization into three types: Irrelevance, Repetition, and Sycophancy.<n>Agents tend to retrieve and over-attend to user memories even when unnecessary.<n>Our work takes an initial step toward more controllable and appropriate personalization in memory-augmented dialogue systems.
Score: 55.27061195244624
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Memory-augmented conversational agents enable personalized interactions using long-term user memory and have gained substantial traction. However, existing benchmarks primarily focus on whether agents can recall and apply user information, while overlooking whether such personalization is used appropriately. In fact, agents may overuse personal information, producing responses that feel forced, intrusive, or socially inappropriate to users. We refer to this issue as \emph{over-personalization}. In this work, we formalize over-personalization into three types: Irrelevance, Repetition, and Sycophancy, and introduce \textbf{OP-Bench} a benchmark of 1,700 verified instances constructed from long-horizon dialogue histories. Using \textbf{OP-Bench}, we evaluate multiple large language models and memory-augmentation methods, and find that over-personalization is widespread when memory is introduced. Further analysis reveals that agents tend to retrieve and over-attend to user memories even when unnecessary. To address this issue, we propose \textbf{Self-ReCheck}, a lightweight, model-agnostic memory filtering mechanism that mitigates over-personalization while preserving personalization performance. Our work takes an initial step toward more controllable and appropriate personalization in memory-augmented dialogue systems.

Related papers

MemoryArena: Benchmarking Agent Memory in Interdependent Multi-Session Agentic Tasks [55.145729491377374]
Existing evaluations of agents with memory typically assess memorization and action in isolation.<n>We introduce MemoryArena, a unified evaluation gym for benchmarking agent memory in multi-session Memory-Agent-Environment loops.<n> MemoryArena supports evaluation across web navigation, preference-constrained planning, progressive information search, and sequential formal reasoning.
arXiv Detail & Related papers (2026-02-18T09:49:14Z)
The Algorithmic Self-Portrait: Deconstructing Memory in ChatGPT [17.579565226391146]
We analyze 2,050 memory entries from 80 real-world ChatGPT users.<n>A striking 96% of memories in our dataset are created unilaterally by the conversational system.<n>A significant majority of memories (84%) are directly grounded in user context.
arXiv Detail & Related papers (2026-02-01T21:39:36Z)
Controllable Memory Usage: Balancing Anchoring and Innovation in Long-Term Human-Agent Interaction [35.20324450282101]
We show that an agent's reliance on memory can be modeled as an explicit and user-controllable dimension.<n>We propose textbfSteerable textbfMemory Agent, textttSteeM, a framework that allows users to dynamically regulate memory reliance.
arXiv Detail & Related papers (2026-01-08T16:54:30Z)
PersonaMem-v2: Towards Personalized Intelligence via Learning Implicit User Personas and Agentic Memory [56.81126490418336]
Personalization is one of the next milestones in advancing AI capability and alignment.<n> PersonaMem-v2 simulates 1,000 realistic user-chatbot interactions on 300+ scenarios, 20,000+ user preferences, and 128k-token context windows.<n>We train Qwen3-4B to outperforms GPT-5, reaching 53% accuracy in implicit personalization.
arXiv Detail & Related papers (2025-12-07T06:48:23Z)
O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents [60.1848551962911]
O-Mem is a novel memory framework based on active user profiling.<n>O-Mem supports hierarchical retrieval of persona attributes and topic-related context.
arXiv Detail & Related papers (2025-11-17T16:55:19Z)
MemWeaver: A Hierarchical Memory from Textual Interactive Behaviors for Personalized Generation [12.075641773020152]
We propose a framework that weaves the user's entire textual history into a hierarchical memory to power deeply personalized generation.<n>MemWeaver builds two complementary memory components that both integrate temporal and semantic information.
arXiv Detail & Related papers (2025-10-09T02:47:21Z)
Explicit v.s. Implicit Memory: Exploring Multi-hop Complex Reasoning Over Personalized Information [13.292751023556221]
In large language model-based agents, memory serves as a critical capability for achieving personalization by storing and utilizing users' information.<n>We propose the multi-hop personalized reasoning task to explore how different memory mechanisms perform in multi-hop reasoning over personalized information.
arXiv Detail & Related papers (2025-08-18T13:34:37Z)
PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time [87.99027488664282]
PersonaAgent is a framework designed to address versatile personalization tasks.<n>It integrates a personalized memory module and a personalized action module.<n>Test-time user-preference alignment strategy ensures real-time user preference alignment.
arXiv Detail & Related papers (2025-06-06T17:29:49Z)
MemoCRS: Memory-enhanced Sequential Conversational Recommender Systems with Large Language Models [51.65439315425421]
We propose a Memory-enhanced Conversational Recommender System Framework with Large Language Models (dubbed MemoCRS) User-specific memory is tailored to each user for their personalized interests. The general memory, encapsulating collaborative knowledge and reasoning guidelines, can provide shared knowledge for users.
arXiv Detail & Related papers (2024-07-06T04:57:25Z)
Personalized Large Language Model Assistant with Evolving Conditional Memory [15.780762727225122]
We present a plug-and-play framework that could facilitate personalized large language model assistants with evolving conditional memory. The personalized assistant focuses on intelligently preserving the knowledge and experience from the history dialogue with the user.
arXiv Detail & Related papers (2023-12-22T02:39:15Z)
LaMemo: Language Modeling with Look-Ahead Memory [50.6248714811912]
We propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens. LaMemo embraces bi-directional attention and segment recurrence with an additional overhead only linearly proportional to the memory length. Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.
arXiv Detail & Related papers (2022-04-15T06:11:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.