Related papers: Simulating Before Planning: Constructing Intrinsic User World Model for User-Tailored Dialogue Policy Planning

Simulating Before Planning: Constructing Intrinsic User World Model for User-Tailored Dialogue Policy Planning

URL: http://arxiv.org/abs/2504.13643v1
Date: Fri, 18 Apr 2025 11:48:55 GMT
Title: Simulating Before Planning: Constructing Intrinsic User World Model for User-Tailored Dialogue Policy Planning
Authors: Tao He, Lizi Liao, Ming Liu, Bing Qin,
Abstract summary: We present the User-Tailored Dialogue Policy Planning (UDP) framework, which incorporates an Intrinsic User World Model to model user traits and feedback.<n>UDP operates in three stages: (1) User Persona Portraying, using a diffusion model to dynamically infer user profiles; (2) User Feedback Anticipating, leveraging a Brownian Bridge-inspired anticipator to predict user reactions; and (3) User-Tailored Policy Planning, integrating these insights to optimize response strategies.
Score: 31.785493263807684
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in dialogue policy planning have emphasized optimizing system agent policies to achieve predefined goals, focusing on strategy design, trajectory acquisition, and efficient training paradigms. However, these approaches often overlook the critical role of user characteristics, which are essential in real-world scenarios like conversational search and recommendation, where interactions must adapt to individual user traits such as personality, preferences, and goals. To address this gap, we first conduct a comprehensive study utilizing task-specific user personas to systematically assess dialogue policy planning under diverse user behaviors. By leveraging realistic user profiles for different tasks, our study reveals significant limitations in existing approaches, highlighting the need for user-tailored dialogue policy planning. Building on this foundation, we present the User-Tailored Dialogue Policy Planning (UDP) framework, which incorporates an Intrinsic User World Model to model user traits and feedback. UDP operates in three stages: (1) User Persona Portraying, using a diffusion model to dynamically infer user profiles; (2) User Feedback Anticipating, leveraging a Brownian Bridge-inspired anticipator to predict user reactions; and (3) User-Tailored Policy Planning, integrating these insights to optimize response strategies. To ensure robust performance, we further propose an active learning approach that prioritizes challenging user personas during training. Comprehensive experiments on benchmarks, including collaborative and non-collaborative settings, demonstrate the effectiveness of UDP in learning user-specific dialogue strategies. Results validate the protocol's utility and highlight UDP's robustness, adaptability, and potential to advance user-centric dialogue systems.

Related papers

Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward [11.495697919066341]
Policy agents must be able to personalize their behavior to suit a user's preferences, personality, and attributes. Current training methods like Reinforcement Learning from Human Feedback (RLHF) prioritize helpfulness and safety but fall short in fostering truly empathetic, adaptive, and personalized interactions. We propose to incorporate an intrinsic motivation to improve the conversational agents's model of the user as an additional reward alongside multi-turn RLHF.
arXiv Detail & Related papers (2025-04-04T06:35:02Z)
Towards Personalized Conversational Sales Agents : Contextual User Profiling for Strategic Action [12.637812936971049]
We introduce Conversational Sales (CSales), a novel task that unifies preference elicitation, recommendation, and persuasion.<n>For a realistic evaluation of CSales, we present CSUser, an LLM-based user simulator constructed from real-world data.<n>We also propose CSI, a conversational sales agent that proactively infers contextual profiles through dialogue for personalized action planning.
arXiv Detail & Related papers (2025-03-28T15:49:52Z)
Strength Lies in Differences! Improving Strategy Planning for Non-collaborative Dialogues via Diversified User Simulation [69.5677514160986]
We investigate non-collaborative dialogue agents, which are expected to engage in strategic conversations with diverse users. This poses two main challenges for existing dialogue agents. We propose Trip to enhance the capability in tailored strategic planning, incorporating a user-aware strategic planning module and a population-based training paradigm.
arXiv Detail & Related papers (2024-03-11T14:38:16Z)
Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents [121.46051697742608]
We introduce a new dialogue policy planning paradigm to strategize dialogue problems with a tunable language model plug-in named PPDPP. Specifically, we develop a novel training framework to facilitate supervised fine-tuning over available human-annotated data. PPDPP consistently and substantially outperforms existing approaches on three different proactive dialogue applications.
arXiv Detail & Related papers (2023-11-01T03:20:16Z)
"Think Before You Speak": Improving Multi-Action Dialog Policy by Planning Single-Action Dialogs [33.78889030078026]
Multi-action dialog policy (MADP) generates multiple atomic dialog actions per turn. We propose Planning Enhanced Dialog Policy (PEDP), a novel multi-task learning framework that learns single-action dialog dynamics. Our fully supervised learning-based method achieves a solid task success rate of 90.6%, improving 3% compared to the state-of-the-art methods.
arXiv Detail & Related papers (2022-04-25T07:55:53Z)
Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy [83.61404191470126]
We propose a new solution named I-Pro that can learn Proactive policy in the Interactive setting. Specifically, we learn the trade-off via a learned goal weight, which consists of four factors. The experimental results demonstrate I-Pro significantly outperforms baselines in terms of effectiveness and interpretability.
arXiv Detail & Related papers (2022-04-07T14:11:31Z)
User Satisfaction Estimation with Sequential Dialogue Act Modeling in Goal-oriented Conversational Systems [65.88679683468143]
We propose a novel framework, namely USDA, to incorporate the sequential dynamics of dialogue acts for predicting user satisfaction. USDA incorporates the sequential transitions of both content and act features in the dialogue to predict the user satisfaction. Experimental results on four benchmark goal-oriented dialogue datasets show that the proposed method substantially and consistently outperforms existing methods on USE.
arXiv Detail & Related papers (2022-02-07T02:50:07Z)
Optimizing Interactive Systems via Data-Driven Objectives [70.3578528542663]
We propose an approach that infers the objective directly from observed user interactions. These inferences can be made regardless of prior knowledge and across different types of user behavior. We introduce Interactive System (ISO), a novel algorithm that uses these inferred objectives for optimization.
arXiv Detail & Related papers (2020-06-19T20:49:14Z)
Learning Goal-oriented Dialogue Policy with Opposite Agent Awareness [116.804536884437]
We propose an opposite behavior aware framework for policy learning in goal-oriented dialogues. We estimate the opposite agent's policy from its behavior and use this estimation to improve the target agent by regarding it as part of the target policy.
arXiv Detail & Related papers (2020-04-21T03:13:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.