Integrating Pretrained Language Model for Dialogue Policy Learning
- URL: http://arxiv.org/abs/2111.01398v1
- Date: Tue, 2 Nov 2021 07:16:03 GMT
- Title: Integrating Pretrained Language Model for Dialogue Policy Learning
- Authors: Hongru Wang, Huimin Wang, Zezhong Wang, Kam-Fai Wong
- Abstract summary: Reinforcement Learning (RL) has been witnessed as its potential for training a dialogue policy agent towards maximizing the accumulated rewards given from users.
We decompose the adversarial training into two steps: 1) we integrate a pre-trained language model as a discriminator to judge whether the current system action is good enough for the last user action.
The experimental result demonstrates that our method significantly improves the complete rate (4.4%) and success rate (8.0%) of the dialogue system.
- Score: 23.453017883791237
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Reinforcement Learning (RL) has been witnessed its potential for training a
dialogue policy agent towards maximizing the accumulated rewards given from
users. However, the reward can be very sparse for it is usually only provided
at the end of a dialog session, which causes unaffordable interaction
requirements for an acceptable dialog agent. Distinguished from many efforts
dedicated to optimizing the policy and recovering the reward alternatively
which suffers from easily getting stuck in local optima and model collapse, we
decompose the adversarial training into two steps: 1) we integrate a
pre-trained language model as a discriminator to judge whether the current
system action is good enough for the last user action (i.e., \textit{next
action prediction}); 2) the discriminator gives and extra local dense reward to
guide the agent's exploration. The experimental result demonstrates that our
method significantly improves the complete rate (~4.4\%) and success rate
(~8.0\%) of the dialogue system.
Related papers
- Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations [70.7884839812069]
Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks.
However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome.
In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
arXiv Detail & Related papers (2023-11-09T18:45:16Z) - Enhancing Large Language Model Induced Task-Oriented Dialogue Systems
Through Look-Forward Motivated Goals [76.69419538047813]
ProToD approach anticipates the future dialogue actions and incorporates the goal-oriented reward signal to enhance ToD systems.
We present a novel evaluation method that assesses ToD systems based on goal-driven dialogue simulations.
Empirical experiments conducted on the MultiWoZ 2.1 dataset demonstrate that our model can achieve superior performance using only 10% of the data.
arXiv Detail & Related papers (2023-09-16T10:56:00Z) - JoTR: A Joint Transformer and Reinforcement Learning Framework for
Dialog Policy Learning [53.83063435640911]
Dialogue policy learning (DPL) is a crucial component of dialogue modelling.
We introduce a novel framework, JoTR, to generate flexible dialogue actions.
Unlike traditional methods, JoTR formulates a word-level policy that allows for a more dynamic and adaptable dialogue action generation.
arXiv Detail & Related papers (2023-09-01T03:19:53Z) - What Does The User Want? Information Gain for Hierarchical Dialogue
Policy Optimisation [3.1433893853959605]
optimisation via reinforcement learning (RL) is susceptible to sample inefficiency and instability.
We propose the usage of an intrinsic reward based on information gain to address this issue.
Our algorithm, which we call FeudalGain, achieves state-of-the-art results in most environments of the PyDial framework.
arXiv Detail & Related papers (2021-09-15T07:21:26Z) - WeaSuL: Weakly Supervised Dialogue Policy Learning: Reward Estimation
for Multi-turn Dialogue [17.663449579168297]
We simulate a dialogue between an agent and a user (modelled similar to an agent with supervised learning objective) to interact with each other.
The agent uses dynamic blocking to generate ranked diverse responses and exploration-exploitation to select among the Top-K responses.
Empirical studies with two benchmarks indicate that our model can significantly out-perform the response quality and lead to a successful conversation.
arXiv Detail & Related papers (2021-08-01T08:00:45Z) - Imperfect also Deserves Reward: Multi-Level and Sequential Reward
Modeling for Better Dialog Management [17.168214640974337]
For task-oriented dialog systems, training a Reinforcement Learning based Dialog Management module suffers from low sample efficiency and slow convergence speed due to the sparse rewards in RL.
We propose a multi-level reward modeling approach that factorizes a reward into a three-level hierarchy: domain, act, and slot.
arXiv Detail & Related papers (2021-04-10T12:20:23Z) - Rethinking Supervised Learning and Reinforcement Learning in
Task-Oriented Dialogue Systems [58.724629408229205]
We demonstrate how traditional supervised learning and a simulator-free adversarial learning method can be used to achieve performance comparable to state-of-the-art RL-based methods.
Our main goal is not to beat reinforcement learning with supervised learning, but to demonstrate the value of rethinking the role of reinforcement learning and supervised learning in optimizing task-oriented dialogue systems.
arXiv Detail & Related papers (2020-09-21T12:04:18Z) - Modelling Hierarchical Structure between Dialogue Policy and Natural
Language Generator with Option Framework for Task-oriented Dialogue System [49.39150449455407]
HDNO is an option framework for designing latent dialogue acts to avoid designing specific dialogue act representations.
We test HDNO on MultiWoz 2.0 and MultiWoz 2.1, the datasets on multi-domain dialogues, in comparison with word-level E2E model trained with RL, LaRL and HDSA.
arXiv Detail & Related papers (2020-06-11T20:55:28Z) - Semi-Supervised Dialogue Policy Learning via Stochastic Reward
Estimation [33.688270031454095]
We introduce reward learning to learn from state-action pairs of an optimal policy to provide turn-by-turn rewards.
This approach requires complete state-action annotations of human-to-human dialogues.
We propose a novel reward learning approach for semi-supervised policy learning.
arXiv Detail & Related papers (2020-05-09T06:28:44Z) - Guided Dialog Policy Learning without Adversarial Learning in the Loop [103.20723982440788]
A number of adversarial learning methods have been proposed to learn the reward function together with the dialogue policy.
We propose to decompose the adversarial training into two steps.
First, we train the discriminator with an auxiliary dialogue generator and then incorporate a derived reward model into a common RL method to guide the dialogue policy learning.
arXiv Detail & Related papers (2020-04-07T11:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.