Simulation-Free Hierarchical Latent Policy Planning for Proactive Dialogues
- URL: http://arxiv.org/abs/2412.14584v1
- Date: Thu, 19 Dec 2024 07:06:01 GMT
- Title: Simulation-Free Hierarchical Latent Policy Planning for Proactive Dialogues
- Authors: Tao He, Lizi Liao, Yixin Cao, Yuanxing Liu, Yiheng Sun, Zerui Chen, Ming Liu, Bing Qin,
- Abstract summary: We introduce a novel dialogue policy planning framework, LDPP.
It fully automates the process from mining policies in dialogue records to learning policy planning.
Our experiments demonstrate that LDPP outperforms existing methods on two proactive scenarios.
- Score: 31.92843134331582
- License:
- Abstract: Recent advancements in proactive dialogues have garnered significant attention, particularly for more complex objectives (e.g. emotion support and persuasion). Unlike traditional task-oriented dialogues, proactive dialogues demand advanced policy planning and adaptability, requiring rich scenarios and comprehensive policy repositories to develop such systems. However, existing approaches tend to rely on Large Language Models (LLMs) for user simulation and online learning, leading to biases that diverge from realistic scenarios and result in suboptimal efficiency. Moreover, these methods depend on manually defined, context-independent, coarse-grained policies, which not only incur high expert costs but also raise concerns regarding their completeness. In our work, we highlight the potential for automatically discovering policies directly from raw, real-world dialogue records. To this end, we introduce a novel dialogue policy planning framework, LDPP. It fully automates the process from mining policies in dialogue records to learning policy planning. Specifically, we employ a variant of the Variational Autoencoder to discover fine-grained policies represented as latent vectors. After automatically annotating the data with these latent policy labels, we propose an Offline Hierarchical Reinforcement Learning (RL) algorithm in the latent space to develop effective policy planning capabilities. Our experiments demonstrate that LDPP outperforms existing methods on two proactive scenarios, even surpassing ChatGPT with only a 1.8-billion-parameter LLM.
Related papers
- Controllable Conversations: Planning-Based Dialogue Agent with Large Language Models [52.7201882529976]
Planning-based Conversational Agents (PCA) is a dialogue framework aimed at enhancing controllability of LLM-driven agents.
We propose a dataset comprising SOP-annotated multi-scenario dialogues, generated using a semi-automated role-playing system with GPT-4o.
We also propose a novel method that integrates Chain of Thought reasoning with supervised fine-tuning for SOP prediction and utilizes Monte Carlo Tree Search for optimal action planning during dialogues.
arXiv Detail & Related papers (2024-07-04T12:23:02Z) - Unsupervised Extraction of Dialogue Policies from Conversations [3.102576158218633]
We show how Large Language Models can be instrumental in extracting dialogue policies from datasets.
We then propose a novel method for generating dialogue policies utilizing a controllable and interpretable graph-based methodology.
arXiv Detail & Related papers (2024-06-21T14:57:25Z) - Planning Like Human: A Dual-process Framework for Dialogue Planning [31.995557540062553]
We propose the Dual-Process Dialogue Planning framework to enhance dialogue planning in Large Language Models (LLMs)
Inspired by the dualprocess theory in psychology, we propose the framework, which embodies two modes of thinking: intuitive (fast) and analytical (slow)
Our empirical evaluations affirm DPDP's superiority in achieving both high-quality dialogues and operational efficiency, outpacing existing methods.
arXiv Detail & Related papers (2024-06-08T06:52:47Z) - Plug-and-Play Policy Planner for Large Language Model Powered Dialogue
Agents [121.46051697742608]
We introduce a new dialogue policy planning paradigm to strategize dialogue problems with a tunable language model plug-in named PPDPP.
Specifically, we develop a novel training framework to facilitate supervised fine-tuning over available human-annotated data.
PPDPP consistently and substantially outperforms existing approaches on three different proactive dialogue applications.
arXiv Detail & Related papers (2023-11-01T03:20:16Z) - JoTR: A Joint Transformer and Reinforcement Learning Framework for
Dialog Policy Learning [53.83063435640911]
Dialogue policy learning (DPL) is a crucial component of dialogue modelling.
We introduce a novel framework, JoTR, to generate flexible dialogue actions.
Unlike traditional methods, JoTR formulates a word-level policy that allows for a more dynamic and adaptable dialogue action generation.
arXiv Detail & Related papers (2023-09-01T03:19:53Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Why Guided Dialog Policy Learning performs well? Understanding the role
of adversarial learning and its alternative [0.44267358790081573]
In recent years, reinforcement learning has emerged as a promising option for dialog policy learning (DPL)
One way to estimate rewards from collected data is to train the reward estimator and dialog policy simultaneously using adversarial learning (AL)
This paper identifies the role of AL in DPL through detailed analyses of the objective functions of dialog policy and reward estimator.
We propose a method that eliminates AL from reward estimation and DPL while retaining its advantages.
arXiv Detail & Related papers (2023-07-13T12:29:29Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - Policy-Driven Neural Response Generation for Knowledge-Grounded Dialogue
Systems [18.375851346138155]
Seq2seq neural response generation approaches do not have explicit mechanisms to control the content or style of the generated response.
We propose using a dialogue policy to plan the content and style of target responses in the form of an action plan.
We demonstrate that a basic dialogue policy that operates at the sentence level generates better responses in comparison to turn level generation.
arXiv Detail & Related papers (2020-05-26T06:09:57Z) - Guided Dialog Policy Learning without Adversarial Learning in the Loop [103.20723982440788]
A number of adversarial learning methods have been proposed to learn the reward function together with the dialogue policy.
We propose to decompose the adversarial training into two steps.
First, we train the discriminator with an auxiliary dialogue generator and then incorporate a derived reward model into a common RL method to guide the dialogue policy learning.
arXiv Detail & Related papers (2020-04-07T11:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.