Reinforcement Learning of Multi-Domain Dialog Policies Via Action
Embeddings
- URL: http://arxiv.org/abs/2207.00468v1
- Date: Fri, 1 Jul 2022 14:49:05 GMT
- Title: Reinforcement Learning of Multi-Domain Dialog Policies Via Action
Embeddings
- Authors: Jorge A. Mendez and Alborz Geramifard and Mohammad Ghavamzadeh and
Bing Liu
- Abstract summary: Learning task-oriented dialog policies via reinforcement learning typically requires large amounts of interaction with users.
We propose to leverage data from across different dialog domains, thereby reducing the amount of data required from each given domain.
We show how this approach is capable of learning with significantly less interaction with users, with a reduction of 35% in the number of dialogs required to learn, and to a higher level of proficiency than training separate policies for each domain on a set of simulated domains.
- Score: 38.51601073819774
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Learning task-oriented dialog policies via reinforcement learning typically
requires large amounts of interaction with users, which in practice renders
such methods unusable for real-world applications. In order to reduce the data
requirements, we propose to leverage data from across different dialog domains,
thereby reducing the amount of data required from each given domain. In
particular, we propose to learn domain-agnostic action embeddings, which
capture general-purpose structure that informs the system how to act given the
current dialog context, and are then specialized to a specific domain. We show
how this approach is capable of learning with significantly less interaction
with users, with a reduction of 35% in the number of dialogs required to learn,
and to a higher level of proficiency than training separate policies for each
domain on a set of simulated domains.
Related papers
- Zero-Shot Generalizable End-to-End Task-Oriented Dialog System using
Context Summarization and Domain Schema [2.7178968279054936]
State-of-the-art approaches in task-oriented dialog systems formulate the problem as a conditional sequence generation task.
This requires labeled training data for each new domain or task.
We introduce a novel Zero-Shot generalizable end-to-end Task-oriented Dialog system, ZS-ToD.
arXiv Detail & Related papers (2023-03-28T18:56:31Z) - PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment [58.46761798403072]
A model-based automatic dialogue evaluation metric (ADEM) is expected to perform well across multiple domains.
Despite significant progress, an ADEM that works well in one domain does not necessarily generalize to another.
We propose a Panel of Experts (PoE) network that consists of a shared transformer encoder and a collection of lightweight adapters.
arXiv Detail & Related papers (2022-12-18T02:26:50Z) - Knowledge-grounded Dialog State Tracking [12.585986197627477]
We propose to perform dialog state tracking grounded on knowledge encoded externally.
We query relevant knowledge of various forms based on the dialog context.
We demonstrate superior performance of our proposed method over strong baselines.
arXiv Detail & Related papers (2022-10-13T01:34:08Z) - Graph Neural Network Policies and Imitation Learning for Multi-Domain
Task-Oriented Dialogues [0.716879432974126]
Task-oriented dialogue systems are designed to achieve specific goals while conversing with humans.
In practice, they may have to handle simultaneously several domains and tasks.
We show that structured policies based on graph neural networks combined with different degrees of imitation learning can effectively handle multi-domain dialogues.
arXiv Detail & Related papers (2022-10-11T08:29:10Z) - Manual-Guided Dialogue for Flexible Conversational Agents [84.46598430403886]
How to build and use dialogue data efficiently, and how to deploy models in different domains at scale can be critical issues in building a task-oriented dialogue system.
We propose a novel manual-guided dialogue scheme, where the agent learns the tasks from both dialogue and manuals.
Our proposed scheme reduces the dependence of dialogue models on fine-grained domain ontology, and makes them more flexible to adapt to various domains.
arXiv Detail & Related papers (2022-08-16T08:21:12Z) - "Think Before You Speak": Improving Multi-Action Dialog Policy by
Planning Single-Action Dialogs [33.78889030078026]
Multi-action dialog policy (MADP) generates multiple atomic dialog actions per turn.
We propose Planning Enhanced Dialog Policy (PEDP), a novel multi-task learning framework that learns single-action dialog dynamics.
Our fully supervised learning-based method achieves a solid task success rate of 90.6%, improving 3% compared to the state-of-the-art methods.
arXiv Detail & Related papers (2022-04-25T07:55:53Z) - Meta Dialogue Policy Learning [58.045067703675095]
We propose Deep Transferable Q-Network (DTQN) to utilize shareable low-level signals between domains.
We decompose the state and action representation space into feature subspaces corresponding to these low-level components.
In experiments, our model outperforms baseline models in terms of both success rate and dialogue efficiency.
arXiv Detail & Related papers (2020-06-03T23:53:06Z) - UniConv: A Unified Conversational Neural Architecture for Multi-domain
Task-oriented Dialogues [101.96097419995556]
"UniConv" is a novel unified neural architecture for end-to-end conversational systems in task-oriented dialogues.
We conduct comprehensive experiments in dialogue state tracking, context-to-text, and end-to-end settings on the MultiWOZ2.1 benchmark.
arXiv Detail & Related papers (2020-04-29T16:28:22Z) - Learning Dialog Policies from Weak Demonstrations [32.149932955715705]
Building upon Deep Q-learning from Demonstrations (DQfD), we leverage dialog data to guide the agent to successfully respond to a user's requests.
We make progressively fewer assumptions about the data needed, using labeled, reduced-labeled, and even unlabeled data.
Experiments in a challenging multi-domain dialog system framework validate our approaches, and get high success rates even when trained on out-of-domain data.
arXiv Detail & Related papers (2020-04-23T10:22:16Z) - Dynamic Fusion Network for Multi-Domain End-to-end Task-Oriented Dialog [70.79442700890843]
We propose a novel Dynamic Fusion Network (DF-Net) which automatically exploit the relevance between the target domain and each domain.
With little training data, we show its transferability by outperforming prior best model by 13.9% on average.
arXiv Detail & Related papers (2020-04-23T08:17:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.