PersonaMem-v2: Towards Personalized Intelligence via Learning Implicit User Personas and Agentic Memory
- URL: http://arxiv.org/abs/2512.06688v1
- Date: Sun, 07 Dec 2025 06:48:23 GMT
- Title: PersonaMem-v2: Towards Personalized Intelligence via Learning Implicit User Personas and Agentic Memory
- Authors: Bowen Jiang, Yuan Yuan, Maohao Shen, Zhuoqun Hao, Zhangchen Xu, Zichen Chen, Ziyi Liu, Anvesh Rao Vijjini, Jiashu He, Hanchao Yu, Radha Poovendran, Gregory Wornell, Lyle Ungar, Dan Roth, Sihao Chen, Camillo Jose Taylor,
- Abstract summary: Personalization is one of the next milestones in advancing AI capability and alignment.<n> PersonaMem-v2 simulates 1,000 realistic user-chatbot interactions on 300+ scenarios, 20,000+ user preferences, and 128k-token context windows.<n>We train Qwen3-4B to outperforms GPT-5, reaching 53% accuracy in implicit personalization.
- Score: 56.81126490418336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personalization is one of the next milestones in advancing AI capability and alignment. We introduce PersonaMem-v2, the state-of-the-art dataset for LLM personalization that simulates 1,000 realistic user-chatbot interactions on 300+ scenarios, 20,000+ user preferences, and 128k-token context windows, where most user preferences are implicitly revealed to reflect real-world interactions. Using this data, we investigate how reinforcement fine-tuning enables a model to improve its long-context reasoning capabilities for user understanding and personalization. We also develop a framework for training an agentic memory system, which maintains a single, human-readable memory that grows with each user over time. In our experiments, frontier LLMs still struggle with implicit personalization, achieving only 37-48% accuracy. While they support long context windows, reasoning remains the bottleneck for implicit personalization tasks. Using reinforcement fine-tuning, we successfully train Qwen3-4B to outperforms GPT-5, reaching 53% accuracy in implicit personalization. Moreover, our agentic memory framework achieves state-of-the-art 55% accuracy while using 16x fewer input tokens, relying on a 2k-token memory instead of full 32k conversation histories. These results underscore the impact of our dataset and demonstrate agentic memory as a scalable path toward real-world personalized intelligence.
Related papers
- Learning Personalized Agents from Human Feedback [36.47803872623135]
We introduce Personalized Agents from Human Feedback (PAHF), a framework for continual personalization.<n>PAHF learns online from live interaction using explicit per-user memory.<n> benchmarks quantify an agent's ability to learn initial preferences from scratch and subsequently adapt to persona shifts.
arXiv Detail & Related papers (2026-02-18T04:18:47Z) - MAPLE: A Sub-Agent Architecture for Memory, Learning, and Personalization in Agentic AI Systems [0.0]
Large language model (LLM) agents have emerged as powerful tools for complex tasks, yet their ability to adapt to individual users remains fundamentally limited.<n>We argue this limitation stems from a critical architectural conflation: current systems treat memory, learning, and personalization as a unified capability rather than three distinct mechanisms.<n>We propose MAPLE, a principled decomposition where Memory handles storage and retrieval infrastructure; Learning extracts intelligence from accumulated interactions asynchronously; and Personalization applies learned knowledge in real-time within finite context budgets.
arXiv Detail & Related papers (2026-02-03T03:46:39Z) - The Algorithmic Self-Portrait: Deconstructing Memory in ChatGPT [17.579565226391146]
We analyze 2,050 memory entries from 80 real-world ChatGPT users.<n>A striking 96% of memories in our dataset are created unilaterally by the conversational system.<n>A significant majority of memories (84%) are directly grounded in user context.
arXiv Detail & Related papers (2026-02-01T21:39:36Z) - HumanLLM: Towards Personalized Understanding and Simulation of Human Nature [72.55730315685837]
HumanLLM is a foundation model designed for personalized understanding and simulation of individuals.<n>We first construct the Cognitive Genome, a large-scale corpus curated from real-world user data on platforms like Reddit, Twitter, Blogger, and Amazon.<n>We then formulate diverse learning tasks and perform supervised fine-tuning to empower the model to predict a wide range of individualized human behaviors, thoughts, and experiences.
arXiv Detail & Related papers (2026-01-22T09:27:27Z) - OP-Bench: Benchmarking Over-Personalization for Memory-Augmented Personalized Conversational Agents [55.27061195244624]
We formalize over-personalization into three types: Irrelevance, Repetition, and Sycophancy.<n>Agents tend to retrieve and over-attend to user memories even when unnecessary.<n>Our work takes an initial step toward more controllable and appropriate personalization in memory-augmented dialogue systems.
arXiv Detail & Related papers (2026-01-20T08:27:13Z) - LikeBench: Evaluating Subjective Likability in LLMs for Personalization [11.75597537798083]
We argue that a third axis, likability, is both subjective and central to user experience, yet under-measured by current benchmarks.<n>We introduce LikeBench, a multi-session, dynamic evaluation framework that measures likability across multiple dimensions.<n>Our benchmark shows that strong memory performance does not guarantee high likability: DeepSeek R1, with lower memory accuracy (86%, 17 facts/profile), outperformed Qwen3 by 28% on likability score despite Qwen3's higher memory accuracy (93%, 43 facts/profile)<n>Even SOTA models like GPT-5 adapt well in short
arXiv Detail & Related papers (2025-12-15T08:18:42Z) - CIMemories: A Compositional Benchmark for Contextual Integrity of Persistent Memory in LLMs [62.116710797795314]
Large Language Models (LLMs) increasingly use persistent memory from past interactions to enhance personalization and task performance.<n>We present CIMemories, a benchmark for evaluating whether LLMs appropriately control information flow from memory based on task context.
arXiv Detail & Related papers (2025-11-18T21:51:23Z) - O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents [60.1848551962911]
O-Mem is a novel memory framework based on active user profiling.<n>O-Mem supports hierarchical retrieval of persona attributes and topic-related context.
arXiv Detail & Related papers (2025-11-17T16:55:19Z) - On the Way to LLM Personalization: Learning to Remember User Conversations [13.041775936106998]
Large Language Models (LLMs) have quickly become an invaluable assistant for a variety of tasks.
However, their effectiveness is constrained by their ability to tailor responses to human preferences and behaviors via personalization.
We propose injecting knowledge of prior conversations into LLMs to enable future work on less redundant, personalized conversations.
arXiv Detail & Related papers (2024-11-20T15:45:08Z) - Personalized Large Language Model Assistant with Evolving Conditional Memory [15.780762727225122]
We present a plug-and-play framework that could facilitate personalized large language model assistants with evolving conditional memory.
The personalized assistant focuses on intelligently preserving the knowledge and experience from the history dialogue with the user.
arXiv Detail & Related papers (2023-12-22T02:39:15Z) - A Cooperative Memory Network for Personalized Task-oriented Dialogue
Systems with Incomplete User Profiles [55.951126447217526]
We study personalized Task-oriented Dialogue Systems without assuming that user profiles are complete.
We propose a Cooperative Memory Network (CoMemNN) that has a novel mechanism to gradually enrich user profiles.
CoMemNN is able to enrich user profiles effectively, which results in an improvement of 3.06% in terms of response selection accuracy.
arXiv Detail & Related papers (2021-02-16T18:05:54Z) - PeTra: A Sparsely Supervised Memory Model for People Tracking [50.98911178059019]
We propose PeTra, a memory-augmented neural network designed to track entities in its memory slots.
We empirically compare key modeling choices, finding that we can simplify several aspects of the design of the memory module while retaining strong performance.
PeTra is highly effective in both evaluations, demonstrating its ability to track people in its memory despite being trained with limited annotation.
arXiv Detail & Related papers (2020-05-06T17:45:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.