Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward
- URL: http://arxiv.org/abs/2504.03206v2
- Date: Mon, 07 Jul 2025 17:32:51 GMT
- Title: Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward
- Authors: Yanming Wan, Jiaxing Wu, Marwa Abdulhai, Lior Shani, Natasha Jaques,
- Abstract summary: We propose leveraging a user model to incorporate a curiosity-based intrinsic reward into multi-turn RLHF.<n>This novel reward mechanism encourages the LLM agent to actively infer user traits by optimizing conversations to improve its user model's accuracy.<n>We demonstrate our method's effectiveness in two distinct domains: significantly improving personalization performance in a conversational recommendation task, and personalizing conversations for different learning styles in an educational setting.
- Score: 11.495697919066341
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Effective conversational agents like large language models (LLMs) must personalize their interactions to adapt to user preferences, personalities, and attributes across diverse domains like education and healthcare. Current methods like Reinforcement Learning from Human Feedback (RLHF), often prioritize helpfulness and safety but fall short in fostering truly empathetic, adaptive, and personalized dialogues. Existing personalization approaches typically rely on extensive user history, limiting their effectiveness for new or context-limited users. To address these limitations, we propose leveraging a user model to incorporate a curiosity-based intrinsic reward into multi-turn RLHF. This novel reward mechanism encourages the LLM agent to actively infer user traits by optimizing conversations to improve its user model's accuracy. Consequently, the agent delivers more personalized interactions by learning more about the user. We demonstrate our method's effectiveness in two distinct domains: significantly improving personalization performance in a conversational recommendation task, and personalizing conversations for different learning styles in an educational setting. We show improved generalization capabilities compared to traditional multi-turn RLHF, all while maintaining conversation quality. Our method offers a promising solution for creating more personalized, adaptive, and engaging conversational agents.
Related papers
- Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries [13.187789731783095]
We present a novel framework that learns text-based summaries of each user's preferences, characteristics, and past conversations.<n>These summaries condition the reward model, enabling it to make personalized predictions about the types of responses valued by each user.<n>We show that our method is robust to new users and diverse conversation topics.
arXiv Detail & Related papers (2025-07-17T23:48:51Z) - Aligning Spoken Dialogue Models from User Interactions [55.192134724622235]
We propose a novel preference alignment framework to improve spoken dialogue models on realtime conversations from user interactions.<n>We create a dataset of more than 150,000 preference pairs from raw multi-turn speech conversations annotated with AI feedback.<n>Our findings shed light on the importance of a well-calibrated balance among various dynamics, crucial for natural real-time speech dialogue systems.
arXiv Detail & Related papers (2025-06-26T16:45:20Z) - A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations [112.81207927088117]
PersonaConvBench is a benchmark for evaluating personalized reasoning and generation in multi-turn conversations with large language models (LLMs)<n>We benchmark several commercial and open-source LLMs under a unified prompting setup and observe that incorporating personalized history yields substantial performance improvements.
arXiv Detail & Related papers (2025-05-20T09:13:22Z) - Enhancing User-Oriented Proactivity in Open-Domain Dialogues with Critic Guidance [35.15965694815852]
Open-domain dialogue systems aim to generate natural and engaging conversations.<n>Existing large language models (LLMs) fall short in proactively understanding the user's chatting preferences.<n>We propose a User-oriented Proactive (UPC) to enhance the user-oriented proactivity.
arXiv Detail & Related papers (2025-05-18T09:59:22Z) - Search-Based Interaction For Conversation Recommendation via Generative Reward Model Based Simulated User [117.82681846559909]
Conversational recommendation systems (CRSs) use multi-turn interaction to capture user preferences and provide personalized recommendations.
We propose a generative reward model based simulated user, named GRSU, for automatic interaction with CRSs.
arXiv Detail & Related papers (2025-04-29T06:37:30Z) - Exploring Personality-Aware Interactions in Salesperson Dialogue Agents [21.282523537612477]
This study explores the influence of user personas, defined using the Myers-Briggs Type Indicator (MBTI), on the interaction quality and performance of sales-oriented dialogue agents.<n>Our findings reveal significant patterns in interaction dynamics, task completion rates, and dialogue naturalness, underscoring the future potential for dialogue agents to refine their strategies.
arXiv Detail & Related papers (2025-04-25T04:10:25Z) - Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale [51.9706400130481]
Large Language Models (LLMs) have emerged as personalized assistants for users across a wide range of tasks.
PERSONAMEM features curated user profiles with over 180 simulated user-LLM interaction histories.
We evaluate LLM chatbots' ability to identify the most suitable response according to the current state of the user's profile.
arXiv Detail & Related papers (2025-04-19T08:16:10Z) - Exploring the Impact of Personality Traits on Conversational Recommender Systems: A Simulation with Large Language Models [70.180385882195]
This paper introduces a personality-aware user simulation for Conversational Recommender Systems (CRSs)<n>The user agent induces customizable personality traits and preferences, while the system agent possesses the persuasion capability to simulate realistic interaction in CRSs.<n> Experimental results demonstrate that state-of-the-art LLMs can effectively generate diverse user responses aligned with specified personality traits.
arXiv Detail & Related papers (2025-04-09T13:21:17Z) - Towards Personalized Conversational Sales Agents : Contextual User Profiling for Strategic Action [12.637812936971049]
We introduce Conversational Sales (CSales), a novel task that unifies preference elicitation, recommendation, and persuasion.
For a realistic evaluation of CSales, we present CSUser, an LLM-based user simulator constructed from real-world data.
We also propose CSI, a conversational sales agent that proactively infers contextual profiles through dialogue for personalized action planning.
arXiv Detail & Related papers (2025-03-28T15:49:52Z) - UQABench: Evaluating User Embedding for Prompting LLMs in Personalized Question Answering [39.79275025010785]
name is a benchmark designed to evaluate the effectiveness of user embeddings in prompting large language models for personalization.<n>We conduct extensive experiments on various state-of-the-art methods for modeling user embeddings.
arXiv Detail & Related papers (2025-02-26T14:34:00Z) - Combining LLM decision and RL action selection to improve RL policy for adaptive interventions [9.395236804312496]
We are inspired by the success of Large Language Models (LLMs) to update the RL policy in real time.<n>We use the text-based user preference to influence the action selection on the fly, in order to immediately incorporate the user preference.<n>We show that our approach is able to take into account the text-based user preferences, while improving the RL policy, thus improving personalization in adaptive intervention.
arXiv Detail & Related papers (2025-01-13T00:03:20Z) - On the Way to LLM Personalization: Learning to Remember User Conversations [13.041775936106998]
Large Language Models (LLMs) have quickly become an invaluable assistant for a variety of tasks.
However, their effectiveness is constrained by their ability to tailor responses to human preferences and behaviors via personalization.
We propose injecting knowledge of prior conversations into LLMs to enable future work on less redundant, personalized conversations.
arXiv Detail & Related papers (2024-11-20T15:45:08Z) - Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations [58.65755268815283]
Many real dialogues are interactive, meaning an agent's utterances will influence their conversational partner, elicit information, or change their opinion.
We use this fact to rewrite and augment existing suboptimal data, and train via offline reinforcement learning (RL) an agent that outperforms both prompting and learning from unaltered human demonstrations.
Our results in a user study with real humans show that our approach greatly outperforms existing state-of-the-art dialogue agents.
arXiv Detail & Related papers (2024-11-07T21:37:51Z) - Aligning LLMs with Individual Preferences via Interaction [51.72200436159636]
We train large language models (LLMs) that can ''interact to align''<n>We develop a multi-turn preference dataset containing 3K+ multi-turn conversations in tree structures.<n>For evaluation, we establish the ALOE benchmark, consisting of 100 carefully selected examples and well-designed metrics to measure the customized alignment performance during conversations.
arXiv Detail & Related papers (2024-10-04T17:48:29Z) - CloChat: Understanding How People Customize, Interact, and Experience
Personas in Large Language Models [15.915071948354466]
CloChat is an interface supporting easy and accurate customization of agent personas in large language models.
Results indicate that participants formed emotional bonds with the customized agents, engaged in more dynamic dialogues, and showed interest in sustaining interactions.
arXiv Detail & Related papers (2024-02-23T11:25:17Z) - Interactive Garment Recommendation with User in the Loop [77.35411131350833]
We propose to build a user profile on the fly by integrating user reactions as we recommend complementary items to compose an outfit.
We present a reinforcement learning agent capable of suggesting appropriate garments and ingesting user feedback to improve its recommendations.
arXiv Detail & Related papers (2024-02-18T16:01:28Z) - Personalized Language Modeling from Personalized Human Feedback [45.16986573937782]
Personalized large language models (LLMs) are designed to tailor responses to individual user preferences.<n>We propose Personalized-RLHF, an efficient framework that utilizes a lightweight user model to capture individual user preferences.<n>We show that personalized LLMs trained using P-RLHF generate responses that are more closely aligned with individual user preferences.
arXiv Detail & Related papers (2024-02-06T04:18:58Z) - Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations [70.7884839812069]
Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks.
However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome.
In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
arXiv Detail & Related papers (2023-11-09T18:45:16Z) - COLA: Improving Conversational Recommender Systems by Collaborative
Augmentation [9.99763097964222]
We propose a collaborative augmentation (COLA) method to improve both item representation learning and user preference modeling.
We construct an interactive user-item graph from all conversations, which augments item representations with user-aware information.
To improve user preference modeling, we retrieve similar conversations from the training corpus, where the involved items and attributes that reflect the user's potential interests are used to augment the user representation.
arXiv Detail & Related papers (2022-12-15T12:37:28Z) - Optimizing Interactive Systems via Data-Driven Objectives [70.3578528542663]
We propose an approach that infers the objective directly from observed user interactions.
These inferences can be made regardless of prior knowledge and across different types of user behavior.
We introduce Interactive System (ISO), a novel algorithm that uses these inferred objectives for optimization.
arXiv Detail & Related papers (2020-06-19T20:49:14Z) - Empowering Active Learning to Jointly Optimize System and User Demands [70.66168547821019]
We propose a new active learning approach that jointly optimize the active learning system (training efficiently) and the user (receiving useful instances)
We study our approach in an educational application, which particularly benefits from this technique as the system needs to rapidly learn to predict the appropriateness of an exercise to a particular user.
We evaluate multiple learning strategies and user types with data from real users and find that our joint approach better satisfies both objectives when alternative methods lead to many unsuitable exercises for end users.
arXiv Detail & Related papers (2020-05-09T16:02:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.