Dynamic Dialogue Policy Transformer for Continual Reinforcement Learning
- URL: http://arxiv.org/abs/2204.05928v1
- Date: Tue, 12 Apr 2022 16:30:40 GMT
- Title: Dynamic Dialogue Policy Transformer for Continual Reinforcement Learning
- Authors: Christian Geishauser, Carel van Niekerk, Nurul Lubis, Michael Heck,
Hsien-Chin Lin, Shutong Feng, Milica Ga\v{s}i\'c
- Abstract summary: Continual learning is one of the key components of human learning and a necessary requirement of artificial intelligence.
We provide a framework with training protocols, baseline models and suitable metrics for assessing continual learning models.
We propose the dynamic dialogue policy transformer (DDPT), a novel dynamic architecture that can integrate new knowledge seamlessly.
- Score: 2.580163308334609
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual learning is one of the key components of human learning and a
necessary requirement of artificial intelligence. As dialogue can potentially
span infinitely many topics and tasks, a task-oriented dialogue system must
have the capability to continually learn, dynamically adapting to new
challenges while preserving the knowledge it already acquired. Despite the
importance, continual reinforcement learning of the dialogue policy has
remained largely unaddressed. The lack of a framework with training protocols,
baseline models and suitable metrics, has so far hindered research in this
direction. In this work we fill precisely this gap, enabling research in
dialogue policy optimisation to go from static to dynamic learning. We provide
a continual learning algorithm, baseline architectures and metrics for
assessing continual learning models. Moreover, we propose the dynamic dialogue
policy transformer (DDPT), a novel dynamic architecture that can integrate new
knowledge seamlessly, is capable of handling large state spaces and obtains
significant zero-shot performance when being exposed to unseen domains, without
any growth in network parameter size.
Related papers
- The Future of Continual Learning in the Era of Foundation Models: Three Key Directions [3.805777835466912]
We argue that continual learning remains essential for three key reasons.<n>We argue it is continual compositionality that will mark the rebirth of continual learning.<n>The future of AI will not be defined by a single static model but by an ecosystem of continually evolving and interacting models.
arXiv Detail & Related papers (2025-06-03T19:06:41Z) - SAGE: Steering and Refining Dialog Generation with State-Action Augmentation [9.95917154889491]
We present a novel approach called SAGE that uses latent variables to control long-horizon behavior in dialogue generation.
At the core of our method is the State-Action Chain (SAC), which augments standard language model fine-tuning.
We show that models trained with this approach demonstrate improved performance in emotional intelligence metrics.
arXiv Detail & Related papers (2025-03-04T22:45:24Z) - Opportunities and Challenges in Neural Dialog Tutoring [54.07241332881601]
We rigorously analyze various generative language models on two dialog tutoring datasets for language learning.
We find that although current approaches can model tutoring in constrained learning scenarios, they perform poorly in less constrained scenarios.
Our human quality evaluation shows that both models and ground-truth annotations exhibit low performance in terms of equitable tutoring.
arXiv Detail & Related papers (2023-01-24T11:00:17Z) - A Simple But Effective Approach to n-shot Task-Oriented Dialogue
Augmentation [32.43362825854633]
We introduce a framework that creates synthetic task-oriented dialogues in a fully automatic manner.
Our framework uses the simple idea that each turn-pair in a task-oriented dialogue has a certain function.
We observe significant improvements in the fine-tuning scenarios in several domains.
arXiv Detail & Related papers (2021-02-27T18:55:12Z) - Continual Learning in Task-Oriented Dialogue Systems [49.35627673523519]
Continual learning in task-oriented dialogue systems can allow us to add new domains and functionalities through time without incurring the high cost of a whole system retraining.
We propose a continual learning benchmark for task-oriented dialogue systems with 37 domains to be learned continuously in four settings.
arXiv Detail & Related papers (2020-12-31T08:44:25Z) - Rethinking Supervised Learning and Reinforcement Learning in
Task-Oriented Dialogue Systems [58.724629408229205]
We demonstrate how traditional supervised learning and a simulator-free adversarial learning method can be used to achieve performance comparable to state-of-the-art RL-based methods.
Our main goal is not to beat reinforcement learning with supervised learning, but to demonstrate the value of rethinking the role of reinforcement learning and supervised learning in optimizing task-oriented dialogue systems.
arXiv Detail & Related papers (2020-09-21T12:04:18Z) - Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning.
The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior.
Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z) - Dialog Policy Learning for Joint Clarification and Active Learning
Queries [24.420113907842147]
We train a hierarchical dialog policy to jointly perform both clarification and active learning.
We show that jointly learning dialog policies for clarification and active learning is more effective than the use of static dialog policies for one or both of these functions.
arXiv Detail & Related papers (2020-06-09T18:53:21Z) - Meta Dialogue Policy Learning [58.045067703675095]
We propose Deep Transferable Q-Network (DTQN) to utilize shareable low-level signals between domains.
We decompose the state and action representation space into feature subspaces corresponding to these low-level components.
In experiments, our model outperforms baseline models in terms of both success rate and dialogue efficiency.
arXiv Detail & Related papers (2020-06-03T23:53:06Z) - Recent Advances and Challenges in Task-oriented Dialog System [63.82055978899631]
Task-oriented dialog systems are attracting more and more attention in academic and industrial communities.
We discuss three critical topics for task-oriented dialog systems: (1) improving data efficiency to facilitate dialog modeling in low-resource settings, (2) modeling multi-turn dynamics for dialog policy learning, and (3) integrating domain knowledge into the dialog model.
arXiv Detail & Related papers (2020-03-17T01:34:56Z) - Learning from Easy to Complex: Adaptive Multi-curricula Learning for
Neural Dialogue Generation [40.49175137775255]
Current state-of-the-art neural dialogue systems are mainly data-driven and are trained on human-generated responses.
We propose an adaptive multi-curricula learning framework to schedule a committee of the organized curricula.
arXiv Detail & Related papers (2020-03-02T03:09:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.