DORA: Toward Policy Optimization for Task-oriented Dialogue System with
Efficient Context
- URL: http://arxiv.org/abs/2107.03286v1
- Date: Wed, 7 Jul 2021 15:24:27 GMT
- Title: DORA: Toward Policy Optimization for Task-oriented Dialogue System with
Efficient Context
- Authors: Hyunmin Jeon, Gary Geunbae Lee
- Abstract summary: We propose a multi-domain task-oriented dialogue system, called Dialogue System with Optimizing a Recurrent Action Policy using Efficient Context (DORA)
DORA is clearly optimized during both SL and RL steps by using an explicit system action policy that considers an efficient context instead of the entire dialogue history.
DORA improved the success rate by 6.6 points on MultiWOZ 2.0 and by 10.9 points on MultiWOZ 2.1.
- Score: 3.962145079528281
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, reinforcement learning (RL) has been applied to task-oriented
dialogue systems by using latent actions to solve shortcomings of supervised
learning (SL). In this paper, we propose a multi-domain task-oriented dialogue
system, called Dialogue System with Optimizing a Recurrent Action Policy using
Efficient Context (DORA), that uses SL, with subsequently applied RL to
optimize dialogue systems using a recurrent dialogue policy. This dialogue
policy recurrently generates explicit system actions as a both word-level and
high-level policy. As a result, DORA is clearly optimized during both SL and RL
steps by using an explicit system action policy that considers an efficient
context instead of the entire dialogue history. The system actions are both
interpretable and controllable, whereas the latent actions are not. DORA
improved the success rate by 6.6 points on MultiWOZ 2.0 and by 10.9 points on
MultiWOZ 2.1.
Related papers
- Planning Like Human: A Dual-process Framework for Dialogue Planning [31.995557540062553]
We propose the Dual-Process Dialogue Planning framework to enhance dialogue planning in Large Language Models (LLMs)
Inspired by the dualprocess theory in psychology, we propose the framework, which embodies two modes of thinking: intuitive (fast) and analytical (slow)
Our empirical evaluations affirm DPDP's superiority in achieving both high-quality dialogues and operational efficiency, outpacing existing methods.
arXiv Detail & Related papers (2024-06-08T06:52:47Z) - Plug-and-Play Policy Planner for Large Language Model Powered Dialogue
Agents [121.46051697742608]
We introduce a new dialogue policy planning paradigm to strategize dialogue problems with a tunable language model plug-in named PPDPP.
Specifically, we develop a novel training framework to facilitate supervised fine-tuning over available human-annotated data.
PPDPP consistently and substantially outperforms existing approaches on three different proactive dialogue applications.
arXiv Detail & Related papers (2023-11-01T03:20:16Z) - Self-Explanation Prompting Improves Dialogue Understanding in Large
Language Models [52.24756457516834]
We propose a novel "Self-Explanation" prompting strategy to enhance the comprehension abilities of Large Language Models (LLMs)
This task-agnostic approach requires the model to analyze each dialogue utterance before task execution, thereby improving performance across various dialogue-centric tasks.
Experimental results from six benchmark datasets confirm that our method consistently outperforms other zero-shot prompts and matches or exceeds the efficacy of few-shot prompts.
arXiv Detail & Related papers (2023-09-22T15:41:34Z) - Enhancing Large Language Model Induced Task-Oriented Dialogue Systems
Through Look-Forward Motivated Goals [76.69419538047813]
ProToD approach anticipates the future dialogue actions and incorporates the goal-oriented reward signal to enhance ToD systems.
We present a novel evaluation method that assesses ToD systems based on goal-driven dialogue simulations.
Empirical experiments conducted on the MultiWoZ 2.1 dataset demonstrate that our model can achieve superior performance using only 10% of the data.
arXiv Detail & Related papers (2023-09-16T10:56:00Z) - JoTR: A Joint Transformer and Reinforcement Learning Framework for
Dialog Policy Learning [53.83063435640911]
Dialogue policy learning (DPL) is a crucial component of dialogue modelling.
We introduce a novel framework, JoTR, to generate flexible dialogue actions.
Unlike traditional methods, JoTR formulates a word-level policy that allows for a more dynamic and adaptable dialogue action generation.
arXiv Detail & Related papers (2023-09-01T03:19:53Z) - Why Guided Dialog Policy Learning performs well? Understanding the role
of adversarial learning and its alternative [0.44267358790081573]
In recent years, reinforcement learning has emerged as a promising option for dialog policy learning (DPL)
One way to estimate rewards from collected data is to train the reward estimator and dialog policy simultaneously using adversarial learning (AL)
This paper identifies the role of AL in DPL through detailed analyses of the objective functions of dialog policy and reward estimator.
We propose a method that eliminates AL from reward estimation and DPL while retaining its advantages.
arXiv Detail & Related papers (2023-07-13T12:29:29Z) - Taming Continuous Posteriors for Latent Variational Dialogue Policies [1.0312968200748118]
We revisit Gaussian variational posteriors for latent-action RL and show that they can yield even better performance than categoricals.
We achieve this by simplifying the training procedure and propose ways to regularize the latent dialogue policy.
arXiv Detail & Related papers (2022-05-16T12:50:32Z) - "Think Before You Speak": Improving Multi-Action Dialog Policy by
Planning Single-Action Dialogs [33.78889030078026]
Multi-action dialog policy (MADP) generates multiple atomic dialog actions per turn.
We propose Planning Enhanced Dialog Policy (PEDP), a novel multi-task learning framework that learns single-action dialog dynamics.
Our fully supervised learning-based method achieves a solid task success rate of 90.6%, improving 3% compared to the state-of-the-art methods.
arXiv Detail & Related papers (2022-04-25T07:55:53Z) - Distributed Structured Actor-Critic Reinforcement Learning for Universal
Dialogue Management [29.57382819573169]
We focus on devising a policy that chooses which dialogue action to respond to the user.
The sequential system decision-making process can be abstracted into a partially observable Markov decision process.
In the past few years, there are many deep reinforcement learning (DRL) algorithms, which use neural networks (NN) as function approximators.
arXiv Detail & Related papers (2020-09-22T05:39:31Z) - Modelling Hierarchical Structure between Dialogue Policy and Natural
Language Generator with Option Framework for Task-oriented Dialogue System [49.39150449455407]
HDNO is an option framework for designing latent dialogue acts to avoid designing specific dialogue act representations.
We test HDNO on MultiWoz 2.0 and MultiWoz 2.1, the datasets on multi-domain dialogues, in comparison with word-level E2E model trained with RL, LaRL and HDSA.
arXiv Detail & Related papers (2020-06-11T20:55:28Z) - Guided Dialog Policy Learning without Adversarial Learning in the Loop [103.20723982440788]
A number of adversarial learning methods have been proposed to learn the reward function together with the dialogue policy.
We propose to decompose the adversarial training into two steps.
First, we train the discriminator with an auxiliary dialogue generator and then incorporate a derived reward model into a common RL method to guide the dialogue policy learning.
arXiv Detail & Related papers (2020-04-07T11:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.