Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward
- URL: http://arxiv.org/abs/2504.03206v1
- Date: Fri, 04 Apr 2025 06:35:02 GMT
- Title: Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward
- Authors: Yanming Wan, Jiaxing Wu, Marwa Abdulhai, Lior Shani, Natasha Jaques,
- Abstract summary: Policy agents must be able to personalize their behavior to suit a user's preferences, personality, and attributes.<n>Current training methods like Reinforcement Learning from Human Feedback (RLHF) prioritize helpfulness and safety but fall short in fostering truly empathetic, adaptive, and personalized interactions.<n>We propose to incorporate an intrinsic motivation to improve the conversational agents's model of the user as an additional reward alongside multi-turn RLHF.
- Score: 11.495697919066341
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Effective conversational agents must be able to personalize their behavior to suit a user's preferences, personality, and attributes, whether they are assisting with writing tasks or operating in domains like education or healthcare. Current training methods like Reinforcement Learning from Human Feedback (RLHF) prioritize helpfulness and safety but fall short in fostering truly empathetic, adaptive, and personalized interactions. Traditional approaches to personalization often rely on extensive user history, limiting their effectiveness for new or context-limited users. To overcome these limitations, we propose to incorporate an intrinsic motivation to improve the conversational agents's model of the user as an additional reward alongside multi-turn RLHF. This reward mechanism encourages the agent to actively elicit user traits by optimizing conversations to increase the accuracy of its user model. Consequently, the policy agent can deliver more personalized interactions through obtaining more information about the user. We applied our method both education and fitness settings, where LLMs teach concepts or recommend personalized strategies based on users' hidden learning style or lifestyle attributes. Using LLM-simulated users, our approach outperformed a multi-turn RLHF baseline in revealing information about the users' preferences, and adapting to them.
Related papers
- Search-Based Interaction For Conversation Recommendation via Generative Reward Model Based Simulated User [117.82681846559909]
Conversational recommendation systems (CRSs) use multi-turn interaction to capture user preferences and provide personalized recommendations.
We propose a generative reward model based simulated user, named GRSU, for automatic interaction with CRSs.
arXiv Detail & Related papers (2025-04-29T06:37:30Z) - Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale [51.9706400130481]
Large Language Models (LLMs) have emerged as personalized assistants for users across a wide range of tasks.
PERSONAMEM features curated user profiles with over 180 simulated user-LLM interaction histories.
We evaluate LLM chatbots' ability to identify the most suitable response according to the current state of the user's profile.
arXiv Detail & Related papers (2025-04-19T08:16:10Z) - Towards Personalized Conversational Sales Agents : Contextual User Profiling for Strategic Action [12.637812936971049]
We introduce Conversational Sales (CSales), a novel task that unifies preference elicitation, recommendation, and persuasion.
For a realistic evaluation of CSales, we present CSUser, an LLM-based user simulator constructed from real-world data.
We also propose CSI, a conversational sales agent that proactively infers contextual profiles through dialogue for personalized action planning.
arXiv Detail & Related papers (2025-03-28T15:49:52Z) - UQABench: Evaluating User Embedding for Prompting LLMs in Personalized Question Answering [39.79275025010785]
name is a benchmark designed to evaluate the effectiveness of user embeddings in prompting large language models for personalization.<n>We conduct extensive experiments on various state-of-the-art methods for modeling user embeddings.
arXiv Detail & Related papers (2025-02-26T14:34:00Z) - Combining LLM decision and RL action selection to improve RL policy for adaptive interventions [9.395236804312496]
We are inspired by the success of Large Language Models (LLMs) to update the RL policy in real time.<n>We use the text-based user preference to influence the action selection on the fly, in order to immediately incorporate the user preference.<n>We show that our approach is able to take into account the text-based user preferences, while improving the RL policy, thus improving personalization in adaptive intervention.
arXiv Detail & Related papers (2025-01-13T00:03:20Z) - On the Way to LLM Personalization: Learning to Remember User Conversations [13.041775936106998]
Large Language Models (LLMs) have quickly become an invaluable assistant for a variety of tasks.
However, their effectiveness is constrained by their ability to tailor responses to human preferences and behaviors via personalization.
We propose injecting knowledge of prior conversations into LLMs to enable future work on less redundant, personalized conversations.
arXiv Detail & Related papers (2024-11-20T15:45:08Z) - Interactive Garment Recommendation with User in the Loop [77.35411131350833]
We propose to build a user profile on the fly by integrating user reactions as we recommend complementary items to compose an outfit.
We present a reinforcement learning agent capable of suggesting appropriate garments and ingesting user feedback to improve its recommendations.
arXiv Detail & Related papers (2024-02-18T16:01:28Z) - Personalized Language Modeling from Personalized Human Feedback [45.16986573937782]
Personalized large language models (LLMs) are designed to tailor responses to individual user preferences.<n>We propose Personalized-RLHF, an efficient framework that utilizes a lightweight user model to capture individual user preferences.<n>We show that personalized LLMs trained using P-RLHF generate responses that are more closely aligned with individual user preferences.
arXiv Detail & Related papers (2024-02-06T04:18:58Z) - Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations [70.7884839812069]
Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks.
However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome.
In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
arXiv Detail & Related papers (2023-11-09T18:45:16Z) - COLA: Improving Conversational Recommender Systems by Collaborative
Augmentation [9.99763097964222]
We propose a collaborative augmentation (COLA) method to improve both item representation learning and user preference modeling.
We construct an interactive user-item graph from all conversations, which augments item representations with user-aware information.
To improve user preference modeling, we retrieve similar conversations from the training corpus, where the involved items and attributes that reflect the user's potential interests are used to augment the user representation.
arXiv Detail & Related papers (2022-12-15T12:37:28Z) - Optimizing Interactive Systems via Data-Driven Objectives [70.3578528542663]
We propose an approach that infers the objective directly from observed user interactions.
These inferences can be made regardless of prior knowledge and across different types of user behavior.
We introduce Interactive System (ISO), a novel algorithm that uses these inferred objectives for optimization.
arXiv Detail & Related papers (2020-06-19T20:49:14Z) - Empowering Active Learning to Jointly Optimize System and User Demands [70.66168547821019]
We propose a new active learning approach that jointly optimize the active learning system (training efficiently) and the user (receiving useful instances)
We study our approach in an educational application, which particularly benefits from this technique as the system needs to rapidly learn to predict the appropriateness of an exercise to a particular user.
We evaluate multiple learning strategies and user types with data from real users and find that our joint approach better satisfies both objectives when alternative methods lead to many unsuitable exercises for end users.
arXiv Detail & Related papers (2020-05-09T16:02:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.