Training Proactive and Personalized LLM Agents
- URL: http://arxiv.org/abs/2511.02208v1
- Date: Tue, 04 Nov 2025 02:59:36 GMT
- Title: Training Proactive and Personalized LLM Agents
- Authors: Weiwei Sun, Xuhui Zhou, Weihua Du, Xingyao Wang, Sean Welleck, Graham Neubig, Maarten Sap, Yiming Yang,
- Abstract summary: We introduce PPP, a multi-objective reinforcement learning approach that jointly optimize all three dimensions: Productivity, Proactivity, and Personalization.<n>Experiments show that agents trained with PPP achieve substantial improvements over strong baselines such as GPT-5 (+21.6 on average)<n>This work demonstrates that explicitly optimizing for user-centered interaction is critical for building practical and effective AI agents.
- Score: 107.57805582180315
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While existing work focuses primarily on task success, we argue that effective real-world agents require optimizing three dimensions: productivity (task completion), proactivity (asking essential questions), and personalization (adapting to diverse user preferences). We introduce UserVille, an interactive environment with LLM-based user simulators enabling diverse, configurable user preferences. Leveraging UserVille, we introduce PPP, a multi-objective reinforcement learning approach that jointly optimizes all three dimensions: Productivity, Proactivity, and Personalization. Experiments on software engineering and deep research tasks show that agents trained with PPP achieve substantial improvements over strong baselines such as GPT-5 (+21.6 on average), demonstrating the ability to ask strategic clarifying questions, adapt to unseen user preferences, and improve task success through better interaction. This work demonstrates that explicitly optimizing for user-centered interaction is critical for building practical and effective AI agents.
Related papers
- Pushing Forward Pareto Frontiers of Proactive Agents with Behavioral Agentic Optimization [61.641777037967366]
Proactive large language model (LLM) agents aim to actively plan, query, and interact over multiple turns.<n>Agentic reinforcement learning (RL) has emerged as a promising solution for training such agents in multi-turn settings.<n>We propose BAO, an agentic RL framework that combines behavior enhancement to enrich proactive reasoning and information-gathering capabilities.
arXiv Detail & Related papers (2026-02-11T20:40:43Z) - UserRL: Training Interactive User-Centric Agent via Reinforcement Learning [104.63494870852894]
Reinforcement learning (RL) has shown promise in training agentic models that engage in dynamic, multi-turn interactions.<n>We propose UserRL, a unified framework for training and evaluating user-centric abilities through standardized gym environments.
arXiv Detail & Related papers (2025-09-24T03:33:20Z) - Evaluating the Effectiveness of Large Language Models in Solving Simple Programming Tasks: A User-Centered Study [1.0467092641687232]
This study investigates how different interaction styles with ChatGPT-4o affect user performance on simple programming tasks.<n>I conducted a within-subjects experiment where fifteen high school students completed three problems under three distinct versions of the model.
arXiv Detail & Related papers (2025-07-05T13:52:31Z) - Thought-Augmented Planning for LLM-Powered Interactive Recommender Agent [56.61028117645315]
We propose a novel thought-augmented interactive recommender agent system (TAIRA) that addresses complex user intents through distilled thought patterns.<n>Specifically, TAIRA is designed as an LLM-powered multi-agent system featuring a manager agent that orchestrates recommendation tasks by decomposing user needs and planning subtasks.<n>Through comprehensive experiments conducted across multiple datasets, TAIRA exhibits significantly enhanced performance compared to existing methods.
arXiv Detail & Related papers (2025-06-30T03:15:50Z) - Simulating Before Planning: Constructing Intrinsic User World Model for User-Tailored Dialogue Policy Planning [31.785493263807684]
We present the User-Tailored Dialogue Policy Planning (UDP) framework, which incorporates an Intrinsic User World Model to model user traits and feedback.<n>UDP operates in three stages: (1) User Persona Portraying, using a diffusion model to dynamically infer user profiles; (2) User Feedback Anticipating, leveraging a Brownian Bridge-inspired anticipator to predict user reactions; and (3) User-Tailored Policy Planning, integrating these insights to optimize response strategies.
arXiv Detail & Related papers (2025-04-18T11:48:55Z) - Aligning LLMs with Individual Preferences via Interaction [51.72200436159636]
We train large language models (LLMs) that can ''interact to align''<n>We develop a multi-turn preference dataset containing 3K+ multi-turn conversations in tree structures.<n>For evaluation, we establish the ALOE benchmark, consisting of 100 carefully selected examples and well-designed metrics to measure the customized alignment performance during conversations.
arXiv Detail & Related papers (2024-10-04T17:48:29Z) - From User Surveys to Telemetry-Driven AI Agents: Exploring the Potential of Personalized Productivity Solutions [21.79433247723466]
Information workers increasingly struggle with productivity challenges in modern workplaces.<n>Despite availability of productivity metrics through enterprise tools, workers often fail to translate this data into actionable insights.<n>We present a comprehensive, user-centric approach to address these challenges through AI-based productivity agents tailored to users' needs.
arXiv Detail & Related papers (2024-01-17T04:20:10Z) - Optimizing Interactive Systems via Data-Driven Objectives [70.3578528542663]
We propose an approach that infers the objective directly from observed user interactions.
These inferences can be made regardless of prior knowledge and across different types of user behavior.
We introduce Interactive System (ISO), a novel algorithm that uses these inferred objectives for optimization.
arXiv Detail & Related papers (2020-06-19T20:49:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.