What Does The User Want? Information Gain for Hierarchical Dialogue
Policy Optimisation
- URL: http://arxiv.org/abs/2109.07129v1
- Date: Wed, 15 Sep 2021 07:21:26 GMT
- Title: What Does The User Want? Information Gain for Hierarchical Dialogue
Policy Optimisation
- Authors: Christian Geishauser, Songbo Hu, Hsien-chin Lin, Nurul Lubis, Michael
Heck, Shutong Feng, Carel van Niekerk, Milica Ga\v{s}i\'c
- Abstract summary: optimisation via reinforcement learning (RL) is susceptible to sample inefficiency and instability.
We propose the usage of an intrinsic reward based on information gain to address this issue.
Our algorithm, which we call FeudalGain, achieves state-of-the-art results in most environments of the PyDial framework.
- Score: 3.1433893853959605
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The dialogue management component of a task-oriented dialogue system is
typically optimised via reinforcement learning (RL). Optimisation via RL is
highly susceptible to sample inefficiency and instability. The hierarchical
approach called Feudal Dialogue Management takes a step towards more efficient
learning by decomposing the action space. However, it still suffers from
instability due to the reward only being provided at the end of the dialogue.
We propose the usage of an intrinsic reward based on information gain to
address this issue. Our proposed reward favours actions that resolve
uncertainty or query the user whenever necessary. It enables the policy to
learn how to retrieve the users' needs efficiently, which is an integral aspect
in every task-oriented conversation. Our algorithm, which we call FeudalGain,
achieves state-of-the-art results in most environments of the PyDial framework,
outperforming much more complex approaches. We confirm the sample efficiency
and stability of our algorithm through experiments in simulation and a human
trial.
Related papers
- Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations [70.7884839812069]
Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks.
However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome.
In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
arXiv Detail & Related papers (2023-11-09T18:45:16Z) - Interacting with Non-Cooperative User: A New Paradigm for Proactive
Dialogue Policy [83.61404191470126]
We propose a new solution named I-Pro that can learn Proactive policy in the Interactive setting.
Specifically, we learn the trade-off via a learned goal weight, which consists of four factors.
The experimental results demonstrate I-Pro significantly outperforms baselines in terms of effectiveness and interpretability.
arXiv Detail & Related papers (2022-04-07T14:11:31Z) - Conversational Recommendation: Theoretical Model and Complexity Analysis [6.084774669743511]
We present a theoretical, domain-independent model of conversational recommendation.
We show that finding an efficient conversational strategy is NP-hard.
We also show that catalog characteristics can strongly influence the efficiency of individual conversational strategies.
arXiv Detail & Related papers (2021-11-10T09:05:52Z) - Causal-aware Safe Policy Improvement for Task-oriented dialogue [45.88777832381149]
We propose a batch RL framework for task oriented dialogue policy learning: causal safe policy improvement (CASPI)
We demonstrate the effectiveness of this framework on a dialogue-context-to-text Generation and end-to-end dialogue task of the Multiwoz2.0 dataset.
arXiv Detail & Related papers (2021-03-10T22:34:28Z) - Rethinking Supervised Learning and Reinforcement Learning in
Task-Oriented Dialogue Systems [58.724629408229205]
We demonstrate how traditional supervised learning and a simulator-free adversarial learning method can be used to achieve performance comparable to state-of-the-art RL-based methods.
Our main goal is not to beat reinforcement learning with supervised learning, but to demonstrate the value of rethinking the role of reinforcement learning and supervised learning in optimizing task-oriented dialogue systems.
arXiv Detail & Related papers (2020-09-21T12:04:18Z) - Optimizing Interactive Systems via Data-Driven Objectives [70.3578528542663]
We propose an approach that infers the objective directly from observed user interactions.
These inferences can be made regardless of prior knowledge and across different types of user behavior.
We introduce Interactive System (ISO), a novel algorithm that uses these inferred objectives for optimization.
arXiv Detail & Related papers (2020-06-19T20:49:14Z) - Adaptive Dialog Policy Learning with Hindsight and User Modeling [10.088347529930129]
We develop algorithm LHUA that, for the first time, enables dialog agents to adaptively learn with hindsight from both simulated and real users.
Experimental results suggest that, in success rate and policy quality, LHUA outperforms competitive baselines from the literature.
arXiv Detail & Related papers (2020-05-07T07:43:43Z) - Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward
Decomposition [64.06167416127386]
We propose Multi-Agent Dialog Policy Learning, which regards both the system and the user as the dialog agents.
Two agents interact with each other and are jointly learned simultaneously.
Results show that our method can successfully build a system policy and a user policy simultaneously.
arXiv Detail & Related papers (2020-04-08T04:51:40Z) - Guided Dialog Policy Learning without Adversarial Learning in the Loop [103.20723982440788]
A number of adversarial learning methods have been proposed to learn the reward function together with the dialogue policy.
We propose to decompose the adversarial training into two steps.
First, we train the discriminator with an auxiliary dialogue generator and then incorporate a derived reward model into a common RL method to guide the dialogue policy learning.
arXiv Detail & Related papers (2020-04-07T11:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.