Learning Personalized Agents from Human Feedback
- URL: http://arxiv.org/abs/2602.16173v1
- Date: Wed, 18 Feb 2026 04:18:47 GMT
- Title: Learning Personalized Agents from Human Feedback
- Authors: Kaiqu Liang, Julia Kruk, Shengyi Qian, Xianjun Yang, Shengjie Bi, Yuanshun Yao, Shaoliang Nie, Mingyang Zhang, Lijuan Liu, Jaime Fernández Fisac, Shuyan Zhou, Saghar Hosseini,
- Abstract summary: We introduce Personalized Agents from Human Feedback (PAHF), a framework for continual personalization.<n>PAHF learns online from live interaction using explicit per-user memory.<n> benchmarks quantify an agent's ability to learn initial preferences from scratch and subsequently adapt to persona shifts.
- Score: 36.47803872623135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern AI agents are powerful but often fail to align with the idiosyncratic, evolving preferences of individual users. Prior approaches typically rely on static datasets, either training implicit preference models on interaction history or encoding user profiles in external memory. However, these approaches struggle with new users and with preferences that change over time. We introduce Personalized Agents from Human Feedback (PAHF), a framework for continual personalization in which agents learn online from live interaction using explicit per-user memory. PAHF operationalizes a three-step loop: (1) seeking pre-action clarification to resolve ambiguity, (2) grounding actions in preferences retrieved from memory, and (3) integrating post-action feedback to update memory when preferences drift. To evaluate this capability, we develop a four-phase protocol and two benchmarks in embodied manipulation and online shopping. These benchmarks quantify an agent's ability to learn initial preferences from scratch and subsequently adapt to persona shifts. Our theoretical analysis and empirical results show that integrating explicit memory with dual feedback channels is critical: PAHF learns substantially faster and consistently outperforms both no-memory and single-channel baselines, reducing initial personalization error and enabling rapid adaptation to preference shifts.
Related papers
- GenCI: Generative Modeling of User Interest Shift via Cohort-based Intent Learning for CTR Prediction [84.0125708499372]
We propose a generative user intent framework to model user preferences for click-through rate (CTR) prediction.<n>The framework first employs a generative model, trained with a next-item prediction objective, to proactively produce candidate interest cohorts.<n>A hierarchical candidate-aware network then injects this rich contextual signal into the ranking stage, refining them with cross-attention to align with both user history and the target item.
arXiv Detail & Related papers (2026-01-26T08:15:04Z) - How Does Personalized Memory Shape LLM Behavior? Benchmarking Rational Preference Utilization in Personalized Assistants [25.83552447206606]
Large language model (LLM)-powered assistants have recently integrated memory mechanisms that record user preferences, leading to more personalized and user-aligned responses.<n>We develop RPEval, a benchmark comprising a personalized intent reasoning dataset and a multi-granularity evaluation protocol.<n>RPEval reveals the widespread phenomenon of irrational personalization in existing LLMs and, through error pattern analysis, illustrates its negative impact on user experience.<n>We introduce RP-Reasoner, which treats memory utilization as a pragmatic reasoning process, enabling the selective integration of personalized information.
arXiv Detail & Related papers (2026-01-23T10:19:48Z) - OP-Bench: Benchmarking Over-Personalization for Memory-Augmented Personalized Conversational Agents [55.27061195244624]
We formalize over-personalization into three types: Irrelevance, Repetition, and Sycophancy.<n>Agents tend to retrieve and over-attend to user memories even when unnecessary.<n>Our work takes an initial step toward more controllable and appropriate personalization in memory-augmented dialogue systems.
arXiv Detail & Related papers (2026-01-20T08:27:13Z) - Lightweight Inference-Time Personalization for Frozen Knowledge Graph Embeddings [0.0]
GatedBias is a lightweight inference-time personalization framework for knowledge graphs.<n>Profile-specific features combine with graph-derived binary gates to produce interpretable, per-entity biases.<n>We evaluate GatedBias on two benchmark datasets.
arXiv Detail & Related papers (2025-12-26T22:30:37Z) - O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents [60.1848551962911]
O-Mem is a novel memory framework based on active user profiling.<n>O-Mem supports hierarchical retrieval of persona attributes and topic-related context.
arXiv Detail & Related papers (2025-11-17T16:55:19Z) - Enabling Personalized Long-term Interactions in LLM-based Agents through Persistent Memory and User Profiles [0.4885400580268118]
Large language models (LLMs) increasingly serve as the central control unit of AI agents.<n>We present a framework that integrates persistent memory, dynamic coordination, self-validation, and evolving user profiles to enable personalized long-term interactions.
arXiv Detail & Related papers (2025-10-09T08:22:16Z) - Beyond Static Evaluation: Rethinking the Assessment of Personalized Agent Adaptability in Information Retrieval [12.058221341033835]
We propose a conceptual lens for rethinking evaluation in adaptive personalization.<n>We organize this lens around three core components: (1) persona-based user simulation with temporally evolving preference models; (2) structured elicitation protocols inspired by reference interviews to extract preferences in context; and (3) adaptation-aware evaluation mechanisms that measure how agent behavior improves across sessions and tasks.
arXiv Detail & Related papers (2025-10-05T00:35:37Z) - PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time [87.99027488664282]
PersonaAgent is a framework designed to address versatile personalization tasks.<n>It integrates a personalized memory module and a personalized action module.<n>Test-time user-preference alignment strategy ensures real-time user preference alignment.
arXiv Detail & Related papers (2025-06-06T17:29:49Z) - Reinforced Interactive Continual Learning via Real-time Noisy Human Feedback [59.768119380109084]
This paper introduces an interactive continual learning paradigm where AI models dynamically learn new skills from real-time human feedback.<n>We propose RiCL, a Reinforced interactive Continual Learning framework leveraging Large Language Models (LLMs)<n>Our RiCL approach substantially outperforms existing combinations of state-of-the-art online continual learning and noisy-label learning methods.
arXiv Detail & Related papers (2025-05-15T03:22:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.