Rethinking Supervised Learning and Reinforcement Learning in
Task-Oriented Dialogue Systems
- URL: http://arxiv.org/abs/2009.09781v1
- Date: Mon, 21 Sep 2020 12:04:18 GMT
- Title: Rethinking Supervised Learning and Reinforcement Learning in
Task-Oriented Dialogue Systems
- Authors: Ziming Li and Julia Kiseleva and Maarten de Rijke
- Abstract summary: We demonstrate how traditional supervised learning and a simulator-free adversarial learning method can be used to achieve performance comparable to state-of-the-art RL-based methods.
Our main goal is not to beat reinforcement learning with supervised learning, but to demonstrate the value of rethinking the role of reinforcement learning and supervised learning in optimizing task-oriented dialogue systems.
- Score: 58.724629408229205
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dialogue policy learning for task-oriented dialogue systems has enjoyed great
progress recently mostly through employing reinforcement learning methods.
However, these approaches have become very sophisticated. It is time to
re-evaluate it. Are we really making progress developing dialogue agents only
based on reinforcement learning? We demonstrate how (1)~traditional supervised
learning together with (2)~a simulator-free adversarial learning method can be
used to achieve performance comparable to state-of-the-art RL-based methods.
First, we introduce a simple dialogue action decoder to predict the appropriate
actions. Then, the traditional multi-label classification solution for dialogue
policy learning is extended by adding dense layers to improve the dialogue
agent performance. Finally, we employ the Gumbel-Softmax estimator to
alternatively train the dialogue agent and the dialogue reward model without
using reinforcement learning. Based on our extensive experimentation, we can
conclude the proposed methods can achieve more stable and higher performance
with fewer efforts, such as the domain knowledge required to design a user
simulator and the intractable parameter tuning in reinforcement learning. Our
main goal is not to beat reinforcement learning with supervised learning, but
to demonstrate the value of rethinking the role of reinforcement learning and
supervised learning in optimizing task-oriented dialogue systems.
Related papers
- Scheduled Curiosity-Deep Dyna-Q: Efficient Exploration for Dialog Policy Learning [4.110108749051657]
Training task-oriented dialog agents based on reinforcement learning is time-consuming and requires a large number of interactions with real users.
We propose Scheduled Curiosity-Deep Dyna-Q (SC-DDQ), a curiosity-driven curriculum learning framework based on a state-of-the-art model-based reinforcement learning dialog model, Deep Dyna-Q (DDQ)
Our results show that by introducing scheduled learning and curiosity, the new framework leads to a significant improvement over the DDQ and Deep Q-learning(DQN)
arXiv Detail & Related papers (2024-01-31T06:13:28Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Self-Explanation Prompting Improves Dialogue Understanding in Large
Language Models [52.24756457516834]
We propose a novel "Self-Explanation" prompting strategy to enhance the comprehension abilities of Large Language Models (LLMs)
This task-agnostic approach requires the model to analyze each dialogue utterance before task execution, thereby improving performance across various dialogue-centric tasks.
Experimental results from six benchmark datasets confirm that our method consistently outperforms other zero-shot prompts and matches or exceeds the efficacy of few-shot prompts.
arXiv Detail & Related papers (2023-09-22T15:41:34Z) - Structural Pre-training for Dialogue Comprehension [51.215629336320305]
We present SPIDER, Structural Pre-traIned DialoguE Reader, to capture dialogue exclusive features.
To simulate the dialogue-like features, we propose two training objectives in addition to the original LM objectives.
Experimental results on widely used dialogue benchmarks verify the effectiveness of the newly introduced self-supervised tasks.
arXiv Detail & Related papers (2021-05-23T15:16:54Z) - Continual Learning in Task-Oriented Dialogue Systems [49.35627673523519]
Continual learning in task-oriented dialogue systems can allow us to add new domains and functionalities through time without incurring the high cost of a whole system retraining.
We propose a continual learning benchmark for task-oriented dialogue systems with 37 domains to be learned continuously in four settings.
arXiv Detail & Related papers (2020-12-31T08:44:25Z) - Automatic Curriculum Learning With Over-repetition Penalty for Dialogue
Policy Learning [8.744026064255337]
We propose a novel framework, Automatic Curriculum Learning-based Deep Q-Network (ACL-DQN), to realize the dialogue policy for automatic curriculum learning.
The teacher model arranges a meaningful ordered curriculum and automatically adjusts it by monitoring the learning progress of the dialogue agent.
Experiments show that the ACL-DQN significantly improves the effectiveness and stability of dialogue tasks with a statistically significant margin.
arXiv Detail & Related papers (2020-12-28T02:44:49Z) - Adaptive Dialog Policy Learning with Hindsight and User Modeling [10.088347529930129]
We develop algorithm LHUA that, for the first time, enables dialog agents to adaptively learn with hindsight from both simulated and real users.
Experimental results suggest that, in success rate and policy quality, LHUA outperforms competitive baselines from the literature.
arXiv Detail & Related papers (2020-05-07T07:43:43Z) - Guided Dialog Policy Learning without Adversarial Learning in the Loop [103.20723982440788]
A number of adversarial learning methods have been proposed to learn the reward function together with the dialogue policy.
We propose to decompose the adversarial training into two steps.
First, we train the discriminator with an auxiliary dialogue generator and then incorporate a derived reward model into a common RL method to guide the dialogue policy learning.
arXiv Detail & Related papers (2020-04-07T11:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.