Learning Dialog Policies from Weak Demonstrations
- URL: http://arxiv.org/abs/2004.11054v2
- Date: Thu, 13 Aug 2020 16:02:03 GMT
- Title: Learning Dialog Policies from Weak Demonstrations
- Authors: Gabriel Gordon-Hall, Philip John Gorinski, Shay B. Cohen
- Abstract summary: Building upon Deep Q-learning from Demonstrations (DQfD), we leverage dialog data to guide the agent to successfully respond to a user's requests.
We make progressively fewer assumptions about the data needed, using labeled, reduced-labeled, and even unlabeled data.
Experiments in a challenging multi-domain dialog system framework validate our approaches, and get high success rates even when trained on out-of-domain data.
- Score: 32.149932955715705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning is a promising approach to training a dialog
manager, but current methods struggle with the large state and action spaces of
multi-domain dialog systems. Building upon Deep Q-learning from Demonstrations
(DQfD), an algorithm that scores highly in difficult Atari games, we leverage
dialog data to guide the agent to successfully respond to a user's requests. We
make progressively fewer assumptions about the data needed, using labeled,
reduced-labeled, and even unlabeled data to train expert demonstrators. We
introduce Reinforced Fine-tune Learning, an extension to DQfD, enabling us to
overcome the domain gap between the datasets and the environment. Experiments
in a challenging multi-domain dialog system framework validate our approaches,
and get high success rates even when trained on out-of-domain data.
Related papers
- Improving Conversational Recommendation Systems via Counterfactual Data
Simulation [73.4526400381668]
Conversational recommender systems (CRSs) aim to provide recommendation services via natural language conversations.
Existing CRS approaches often suffer from the issue of insufficient training due to the scarcity of training data.
We propose a CounterFactual data simulation approach for CRS, named CFCRS, to alleviate the issue of data scarcity in CRSs.
arXiv Detail & Related papers (2023-06-05T12:48:56Z) - Pre-training Multi-party Dialogue Models with Latent Discourse Inference [85.9683181507206]
We pre-train a model that understands the discourse structure of multi-party dialogues, namely, to whom each utterance is replying.
To fully utilize the unlabeled data, we propose to treat the discourse structures as latent variables, then jointly infer them and pre-train the discourse-aware model.
arXiv Detail & Related papers (2023-05-24T14:06:27Z) - Discovering Customer-Service Dialog System with Semi-Supervised Learning
and Coarse-to-Fine Intent Detection [6.869753194843482]
Task-oriented dialog aims to assist users in achieving specific goals through multi-turn conversation.
We constructed a weakly supervised dataset based on a teacher/student paradigm.
We also built a modular dialogue system and integrated coarse-to-fine grained classification for user intent detection.
arXiv Detail & Related papers (2022-12-23T14:36:43Z) - Reinforcement Learning of Multi-Domain Dialog Policies Via Action
Embeddings [38.51601073819774]
Learning task-oriented dialog policies via reinforcement learning typically requires large amounts of interaction with users.
We propose to leverage data from across different dialog domains, thereby reducing the amount of data required from each given domain.
We show how this approach is capable of learning with significantly less interaction with users, with a reduction of 35% in the number of dialogs required to learn, and to a higher level of proficiency than training separate policies for each domain on a set of simulated domains.
arXiv Detail & Related papers (2022-07-01T14:49:05Z) - Structure Extraction in Task-Oriented Dialogues with Slot Clustering [94.27806592467537]
In task-oriented dialogues, dialogue structure has often been considered as transition graphs among dialogue states.
We propose a simple yet effective approach for structure extraction in task-oriented dialogues.
arXiv Detail & Related papers (2022-02-28T20:18:12Z) - Self-training Improves Pre-training for Few-shot Learning in
Task-oriented Dialog Systems [47.937191088981436]
Large-scale pre-trained language models, have shown promising results for few-shot learning in ToD.
We propose a self-training approach that iteratively labels the most confident unlabeled data to train a stronger Student model.
We conduct experiments and present analyses on four downstream tasks in ToD, including intent classification, dialog state tracking, dialog act prediction, and response selection.
arXiv Detail & Related papers (2021-08-28T07:22:06Z) - Transferable Dialogue Systems and User Simulators [17.106518400787156]
One of the difficulties in training dialogue systems is the lack of training data.
We explore the possibility of creating dialogue data through the interaction between a dialogue system and a user simulator.
We develop a modelling framework that can incorporate new dialogue scenarios through self-play between the two agents.
arXiv Detail & Related papers (2021-07-25T22:59:09Z) - Data-Efficient Methods for Dialogue Systems [4.061135251278187]
Conversational User Interface (CUI) has become ubiquitous in everyday life, in consumer-focused products like Siri and Alexa.
Deep learning underlies many recent breakthroughs in dialogue systems but requires very large amounts of training data, often annotated by experts.
In this thesis, we introduce a series of methods for training robust dialogue systems from minimal data.
arXiv Detail & Related papers (2020-12-05T02:51:09Z) - Meta Dialogue Policy Learning [58.045067703675095]
We propose Deep Transferable Q-Network (DTQN) to utilize shareable low-level signals between domains.
We decompose the state and action representation space into feature subspaces corresponding to these low-level components.
In experiments, our model outperforms baseline models in terms of both success rate and dialogue efficiency.
arXiv Detail & Related papers (2020-06-03T23:53:06Z) - Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward
Decomposition [64.06167416127386]
We propose Multi-Agent Dialog Policy Learning, which regards both the system and the user as the dialog agents.
Two agents interact with each other and are jointly learned simultaneously.
Results show that our method can successfully build a system policy and a user policy simultaneously.
arXiv Detail & Related papers (2020-04-08T04:51:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.