JoTR: A Joint Transformer and Reinforcement Learning Framework for
Dialog Policy Learning
- URL: http://arxiv.org/abs/2309.00230v1
- Date: Fri, 1 Sep 2023 03:19:53 GMT
- Title: JoTR: A Joint Transformer and Reinforcement Learning Framework for
Dialog Policy Learning
- Authors: Wai-Chung Kwan, Huimin Wang, Hongru Wang, Zezhong Wang, Xian Wu,
Yefeng Zheng, Kam-Fai Wong
- Abstract summary: Dialogue policy learning (DPL) is a crucial component of dialogue modelling.
We introduce a novel framework, JoTR, to generate flexible dialogue actions.
Unlike traditional methods, JoTR formulates a word-level policy that allows for a more dynamic and adaptable dialogue action generation.
- Score: 53.83063435640911
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dialogue policy learning (DPL) is a crucial component of dialogue modelling.
Its primary role is to determine the appropriate abstract response, commonly
referred to as the "dialogue action". Traditional DPL methodologies have
treated this as a sequential decision problem, using pre-defined action
candidates extracted from a corpus. However, these incomplete candidates can
significantly limit the diversity of responses and pose challenges when dealing
with edge cases, which are scenarios that occur only at extreme operating
parameters. To address these limitations, we introduce a novel framework, JoTR.
This framework is unique as it leverages a text-to-text Transformer-based model
to generate flexible dialogue actions. Unlike traditional methods, JoTR
formulates a word-level policy that allows for a more dynamic and adaptable
dialogue action generation, without the need for any action templates. This
setting enhances the diversity of responses and improves the system's ability
to handle edge cases effectively. In addition, JoTR employs reinforcement
learning with a reward-shaping mechanism to efficiently finetune the word-level
dialogue policy, which allows the model to learn from its interactions,
improving its performance over time. We conducted an extensive evaluation of
JoTR to assess its effectiveness. Our extensive evaluation shows that JoTR
achieves state-of-the-art performance on two benchmark dialogue modelling
tasks, as assessed by both user simulators and human evaluators.
Related papers
- Plug-and-Play Policy Planner for Large Language Model Powered Dialogue
Agents [121.46051697742608]
We introduce a new dialogue policy planning paradigm to strategize dialogue problems with a tunable language model plug-in named PPDPP.
Specifically, we develop a novel training framework to facilitate supervised fine-tuning over available human-annotated data.
PPDPP consistently and substantially outperforms existing approaches on three different proactive dialogue applications.
arXiv Detail & Related papers (2023-11-01T03:20:16Z) - DiactTOD: Learning Generalizable Latent Dialogue Acts for Controllable
Task-Oriented Dialogue Systems [15.087619144902776]
We present a novel end-to-end latent dialogue act model (DiactTOD) that represents dialogue acts in a latent space.
When pre-trained on a large corpus, DiactTOD is able to predict and control dialogue acts to generate controllable responses.
arXiv Detail & Related papers (2023-08-01T23:29:16Z) - Why Guided Dialog Policy Learning performs well? Understanding the role
of adversarial learning and its alternative [0.44267358790081573]
In recent years, reinforcement learning has emerged as a promising option for dialog policy learning (DPL)
One way to estimate rewards from collected data is to train the reward estimator and dialog policy simultaneously using adversarial learning (AL)
This paper identifies the role of AL in DPL through detailed analyses of the objective functions of dialog policy and reward estimator.
We propose a method that eliminates AL from reward estimation and DPL while retaining its advantages.
arXiv Detail & Related papers (2023-07-13T12:29:29Z) - "Think Before You Speak": Improving Multi-Action Dialog Policy by
Planning Single-Action Dialogs [33.78889030078026]
Multi-action dialog policy (MADP) generates multiple atomic dialog actions per turn.
We propose Planning Enhanced Dialog Policy (PEDP), a novel multi-task learning framework that learns single-action dialog dynamics.
Our fully supervised learning-based method achieves a solid task success rate of 90.6%, improving 3% compared to the state-of-the-art methods.
arXiv Detail & Related papers (2022-04-25T07:55:53Z) - GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with
Semi-Supervised Learning and Explicit Policy Injection [36.77204909711832]
We propose a novel pre-trained dialog model that explicitly learns dialog policy from limited labeled dialogs and large-scale unlabeled dialog corpora.
Specifically, we introduce a dialog act prediction task for policy optimization during pre-training and employ a consistency regularization term to refine the learned representation.
Empirical results show that GALAXY substantially improves the performance of task-oriented dialog systems.
arXiv Detail & Related papers (2021-11-29T15:24:36Z) - Learning an Effective Context-Response Matching Model with
Self-Supervised Tasks for Retrieval-based Dialogues [88.73739515457116]
We introduce four self-supervised tasks including next session prediction, utterance restoration, incoherence detection and consistency discrimination.
We jointly train the PLM-based response selection model with these auxiliary tasks in a multi-task manner.
Experiment results indicate that the proposed auxiliary self-supervised tasks bring significant improvement for multi-turn response selection.
arXiv Detail & Related papers (2020-09-14T08:44:46Z) - Controlling Dialogue Generation with Semantic Exemplars [55.460082747572734]
We present an Exemplar-based Dialogue Generation model, EDGE, that uses the semantic frames present in exemplar responses to guide generation.
We show that controlling dialogue generation based on the semantic frames of exemplars, rather than words in the exemplar itself, improves the coherence of generated responses.
arXiv Detail & Related papers (2020-08-20T17:02:37Z) - Modelling Hierarchical Structure between Dialogue Policy and Natural
Language Generator with Option Framework for Task-oriented Dialogue System [49.39150449455407]
HDNO is an option framework for designing latent dialogue acts to avoid designing specific dialogue act representations.
We test HDNO on MultiWoz 2.0 and MultiWoz 2.1, the datasets on multi-domain dialogues, in comparison with word-level E2E model trained with RL, LaRL and HDSA.
arXiv Detail & Related papers (2020-06-11T20:55:28Z) - Dialogue-Based Relation Extraction [53.2896545819799]
We present the first human-annotated dialogue-based relation extraction (RE) dataset DialogRE.
We argue that speaker-related information plays a critical role in the proposed task, based on an analysis of similarities and differences between dialogue-based and traditional RE tasks.
Experimental results demonstrate that a speaker-aware extension on the best-performing model leads to gains in both the standard and conversational evaluation settings.
arXiv Detail & Related papers (2020-04-17T03:51:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.