Related papers: Automatic Curriculum Learning With Over-repetition Penalty for Dialogue Policy Learning

Automatic Curriculum Learning With Over-repetition Penalty for Dialogue Policy Learning

URL: http://arxiv.org/abs/2012.14072v1
Date: Mon, 28 Dec 2020 02:44:49 GMT
Title: Automatic Curriculum Learning With Over-repetition Penalty for Dialogue Policy Learning
Authors: Yangyang Zhao, Zhenyu Wang and Zhenhua Huang
Abstract summary: We propose a novel framework, Automatic Curriculum Learning-based Deep Q-Network (ACL-DQN), to realize the dialogue policy for automatic curriculum learning. The teacher model arranges a meaningful ordered curriculum and automatically adjusts it by monitoring the learning progress of the dialogue agent. Experiments show that the ACL-DQN significantly improves the effectiveness and stability of dialogue tasks with a statistically significant margin.
Score: 8.744026064255337
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dialogue policy learning based on reinforcement learning is difficult to be applied to real users to train dialogue agents from scratch because of the high cost. User simulators, which choose random user goals for the dialogue agent to train on, have been considered as an affordable substitute for real users. However, this random sampling method ignores the law of human learning, making the learned dialogue policy inefficient and unstable. We propose a novel framework, Automatic Curriculum Learning-based Deep Q-Network (ACL-DQN), which replaces the traditional random sampling method with a teacher policy model to realize the dialogue policy for automatic curriculum learning. The teacher model arranges a meaningful ordered curriculum and automatically adjusts it by monitoring the learning progress of the dialogue agent and the over-repetition penalty without any requirement of prior knowledge. The learning progress of the dialogue agent reflects the relationship between the dialogue agent's ability and the sampled goals' difficulty for sample efficiency. The over-repetition penalty guarantees the sampled diversity. Experiments show that the ACL-DQN significantly improves the effectiveness and stability of dialogue tasks with a statistically significant margin. Furthermore, the framework can be further improved by equipping with different curriculum schedules, which demonstrates that the framework has strong generalizability.

Related papers

Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents [121.46051697742608]
We introduce a new dialogue policy planning paradigm to strategize dialogue problems with a tunable language model plug-in named PPDPP. Specifically, we develop a novel training framework to facilitate supervised fine-tuning over available human-annotated data. PPDPP consistently and substantially outperforms existing approaches on three different proactive dialogue applications.
arXiv Detail & Related papers (2023-11-01T03:20:16Z)
JoTR: A Joint Transformer and Reinforcement Learning Framework for Dialog Policy Learning [53.83063435640911]
Dialogue policy learning (DPL) is a crucial component of dialogue modelling. We introduce a novel framework, JoTR, to generate flexible dialogue actions. Unlike traditional methods, JoTR formulates a word-level policy that allows for a more dynamic and adaptable dialogue action generation.
arXiv Detail & Related papers (2023-09-01T03:19:53Z)
CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning [85.3987745097806]
offline reinforcement learning can be used to train dialogue agents entirely using static datasets collected from human speakers. Experiments show that recently developed offline RL methods can be combined with language models to yield realistic dialogue agents.
arXiv Detail & Related papers (2022-04-18T17:43:21Z)
DialAug: Mixing up Dialogue Contexts in Contrastive Learning for Robust Conversational Modeling [3.3578533367912025]
We propose a framework that incorporates augmented versions of a dialogue context into the learning objective. We show that our proposed augmentation method outperforms previous data augmentation approaches.
arXiv Detail & Related papers (2022-04-15T23:39:41Z)
Towards Robust Online Dialogue Response Generation [62.99904593650087]
We argue that this can be caused by a discrepancy between training and real-world testing. We propose a hierarchical sampling-based method consisting of both utterance-level sampling and semi-utterance-level sampling.
arXiv Detail & Related papers (2022-03-07T06:51:41Z)
WeaSuL: Weakly Supervised Dialogue Policy Learning: Reward Estimation for Multi-turn Dialogue [17.663449579168297]
We simulate a dialogue between an agent and a user (modelled similar to an agent with supervised learning objective) to interact with each other. The agent uses dynamic blocking to generate ranked diverse responses and exploration-exploitation to select among the Top-K responses. Empirical studies with two benchmarks indicate that our model can significantly out-perform the response quality and lead to a successful conversation.
arXiv Detail & Related papers (2021-08-01T08:00:45Z)
Structural Pre-training for Dialogue Comprehension [51.215629336320305]
We present SPIDER, Structural Pre-traIned DialoguE Reader, to capture dialogue exclusive features. To simulate the dialogue-like features, we propose two training objectives in addition to the original LM objectives. Experimental results on widely used dialogue benchmarks verify the effectiveness of the newly introduced self-supervised tasks.
arXiv Detail & Related papers (2021-05-23T15:16:54Z)
Rethinking Supervised Learning and Reinforcement Learning in Task-Oriented Dialogue Systems [58.724629408229205]
We demonstrate how traditional supervised learning and a simulator-free adversarial learning method can be used to achieve performance comparable to state-of-the-art RL-based methods. Our main goal is not to beat reinforcement learning with supervised learning, but to demonstrate the value of rethinking the role of reinforcement learning and supervised learning in optimizing task-oriented dialogue systems.
arXiv Detail & Related papers (2020-09-21T12:04:18Z)
Adaptive Dialog Policy Learning with Hindsight and User Modeling [10.088347529930129]
We develop algorithm LHUA that, for the first time, enables dialog agents to adaptively learn with hindsight from both simulated and real users. Experimental results suggest that, in success rate and policy quality, LHUA outperforms competitive baselines from the literature.
arXiv Detail & Related papers (2020-05-07T07:43:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.