Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward
Decomposition
- URL: http://arxiv.org/abs/2004.03809v2
- Date: Thu, 23 Apr 2020 02:34:16 GMT
- Title: Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward
Decomposition
- Authors: Ryuichi Takanobu, Runze Liang, Minlie Huang
- Abstract summary: We propose Multi-Agent Dialog Policy Learning, which regards both the system and the user as the dialog agents.
Two agents interact with each other and are jointly learned simultaneously.
Results show that our method can successfully build a system policy and a user policy simultaneously.
- Score: 64.06167416127386
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many studies have applied reinforcement learning to train a dialog policy and
show great promise these years. One common approach is to employ a user
simulator to obtain a large number of simulated user experiences for
reinforcement learning algorithms. However, modeling a realistic user simulator
is challenging. A rule-based simulator requires heavy domain expertise for
complex tasks, and a data-driven simulator requires considerable data and it is
even unclear how to evaluate a simulator. To avoid explicitly building a user
simulator beforehand, we propose Multi-Agent Dialog Policy Learning, which
regards both the system and the user as the dialog agents. Two agents interact
with each other and are jointly learned simultaneously. The method uses the
actor-critic framework to facilitate pretraining and improve scalability. We
also propose Hybrid Value Network for the role-aware reward decomposition to
integrate role-specific domain knowledge of each agent in the task-oriented
dialog. Results show that our method can successfully build a system policy and
a user policy simultaneously, and two agents can achieve a high task success
rate through conversational interaction.
Related papers
- Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning [61.294110816231886]
We introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP)
SDP selectively activates experts and skills, enabling efficient and task-specific learning without retraining the entire model.
Demos and codes can be found in https://forrest-110.io/sparse_diffusion_policy/.
arXiv Detail & Related papers (2024-07-01T17:59:56Z) - Reliable LLM-based User Simulator for Task-Oriented Dialogue Systems [2.788542465279969]
This paper introduces DAUS, a Domain-Aware User Simulator.
We fine-tune DAUS on real examples of task-oriented dialogues.
Results on two relevant benchmarks showcase significant improvements in terms of user goal fulfillment.
arXiv Detail & Related papers (2024-02-20T20:57:47Z) - Adversarial learning of neural user simulators for dialogue policy
optimisation [14.257597015289512]
Reinforcement learning based dialogue policies are typically trained in interaction with a user simulator.
Current data-driven simulators are trained to accurately model the user behaviour in a dialogue corpus.
We propose an alternative method using adversarial learning, with the aim to simulate realistic user behaviour with more variation.
arXiv Detail & Related papers (2023-06-01T16:17:16Z) - In-Context Learning User Simulators for Task-Oriented Dialog Systems [1.7086737326992172]
This paper presents a novel application of large language models in user simulation for task-oriented dialog systems.
By harnessing the power of these models, the proposed approach generates diverse utterances based on user goals and limited dialog examples.
arXiv Detail & Related papers (2023-06-01T15:06:11Z) - "Think Before You Speak": Improving Multi-Action Dialog Policy by
Planning Single-Action Dialogs [33.78889030078026]
Multi-action dialog policy (MADP) generates multiple atomic dialog actions per turn.
We propose Planning Enhanced Dialog Policy (PEDP), a novel multi-task learning framework that learns single-action dialog dynamics.
Our fully supervised learning-based method achieves a solid task success rate of 90.6%, improving 3% compared to the state-of-the-art methods.
arXiv Detail & Related papers (2022-04-25T07:55:53Z) - Metaphorical User Simulators for Evaluating Task-oriented Dialogue
Systems [80.77917437785773]
Task-oriented dialogue systems ( TDSs) are assessed mainly in an offline setting or through human evaluation.
We propose a metaphorical user simulator for end-to-end TDS evaluation, where we define a simulator to be metaphorical if it simulates user's analogical thinking in interactions with systems.
We also propose a tester-based evaluation framework to generate variants, i.e., dialogue systems with different capabilities.
arXiv Detail & Related papers (2022-04-02T05:11:03Z) - Simulated Chats for Building Dialog Systems: Learning to Generate
Conversations from Instructions [14.47025580681492]
We present a data creation strategy that uses the pre-trained language model, GPT2, to simulate the interaction between crowd workers by creating a user bot and an agent bot.
We demonstrate that by using the simulated data, we achieve significant improvements in low-resource settings on two publicly available datasets.
arXiv Detail & Related papers (2020-10-20T12:04:19Z) - Rethinking Supervised Learning and Reinforcement Learning in
Task-Oriented Dialogue Systems [58.724629408229205]
We demonstrate how traditional supervised learning and a simulator-free adversarial learning method can be used to achieve performance comparable to state-of-the-art RL-based methods.
Our main goal is not to beat reinforcement learning with supervised learning, but to demonstrate the value of rethinking the role of reinforcement learning and supervised learning in optimizing task-oriented dialogue systems.
arXiv Detail & Related papers (2020-09-21T12:04:18Z) - SOLOIST: Building Task Bots at Scale with Transfer Learning and Machine
Teaching [81.45928589522032]
We parameterize modular task-oriented dialog systems using a Transformer-based auto-regressive language model.
We pre-train, on heterogeneous dialog corpora, a task-grounded response generation model.
Experiments show that SOLOIST creates new state-of-the-art on well-studied task-oriented dialog benchmarks.
arXiv Detail & Related papers (2020-05-11T17:58:34Z) - Guided Dialog Policy Learning without Adversarial Learning in the Loop [103.20723982440788]
A number of adversarial learning methods have been proposed to learn the reward function together with the dialogue policy.
We propose to decompose the adversarial training into two steps.
First, we train the discriminator with an auxiliary dialogue generator and then incorporate a derived reward model into a common RL method to guide the dialogue policy learning.
arXiv Detail & Related papers (2020-04-07T11:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.