Dialog Action-Aware Transformer for Dialog Policy Learning
- URL: http://arxiv.org/abs/2309.02240v1
- Date: Tue, 5 Sep 2023 13:47:25 GMT
- Title: Dialog Action-Aware Transformer for Dialog Policy Learning
- Authors: Huimin Wang, Wai-Chung Kwan, Kam-Fai Wong
- Abstract summary: We propose to make full use of the plain text knowledge from the pre-trained language model to accelerate the RL agent's learning speed.
Specifically, we design a dialog action-aware transformer encoder (DaTrans) which integrates a new fine-tuning procedure named masked last action task.
DaTrans is further optimized in an RL setting with ongoing interactions and evolves through exploration in the dialog action space toward maximizing long-term accumulated rewards.
- Score: 22.262659702998892
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent works usually address Dialog policy learning DPL by training a
reinforcement learning (RL) agent to determine the best dialog action. However,
existing works on deep RL require a large volume of agent-user interactions to
achieve acceptable performance. In this paper, we propose to make full use of
the plain text knowledge from the pre-trained language model to accelerate the
RL agent's learning speed. Specifically, we design a dialog action-aware
transformer encoder (DaTrans), which integrates a new fine-tuning procedure
named masked last action task to encourage DaTrans to be dialog-aware and
distils action-specific features. Then, DaTrans is further optimized in an RL
setting with ongoing interactions and evolves through exploration in the dialog
action space toward maximizing long-term accumulated rewards. The effectiveness
and efficiency of the proposed model are demonstrated with both simulator
evaluation and human evaluation.
Related papers
- Enabling Real-Time Conversations with Minimal Training Costs [61.80370154101649]
This paper presents a new duplex decoding approach that enhances large language models with duplex ability, requiring minimal training.
Experimental results indicate that our proposed method significantly enhances the naturalness and human-likeness of user-AI interactions with minimal training costs.
arXiv Detail & Related papers (2024-09-18T06:27:26Z) - DialCLIP: Empowering CLIP as Multi-Modal Dialog Retriever [83.33209603041013]
We propose a parameter-efficient prompt-tuning method named DialCLIP for multi-modal dialog retrieval.
Our approach introduces a multi-modal context generator to learn context features which are distilled into prompts within the pre-trained vision-language model CLIP.
To facilitate various types of retrieval, we also design multiple experts to learn mappings from CLIP outputs to multi-modal representation space.
arXiv Detail & Related papers (2024-01-02T07:40:12Z) - Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations [70.7884839812069]
Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks.
However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome.
In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
arXiv Detail & Related papers (2023-11-09T18:45:16Z) - JoTR: A Joint Transformer and Reinforcement Learning Framework for
Dialog Policy Learning [53.83063435640911]
Dialogue policy learning (DPL) is a crucial component of dialogue modelling.
We introduce a novel framework, JoTR, to generate flexible dialogue actions.
Unlike traditional methods, JoTR formulates a word-level policy that allows for a more dynamic and adaptable dialogue action generation.
arXiv Detail & Related papers (2023-09-01T03:19:53Z) - GODEL: Large-Scale Pre-Training for Goal-Directed Dialog [119.1397031992088]
We introduce GODEL, a large pre-trained language model for dialog.
We show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups.
A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses.
arXiv Detail & Related papers (2022-06-22T18:19:32Z) - "Think Before You Speak": Improving Multi-Action Dialog Policy by
Planning Single-Action Dialogs [33.78889030078026]
Multi-action dialog policy (MADP) generates multiple atomic dialog actions per turn.
We propose Planning Enhanced Dialog Policy (PEDP), a novel multi-task learning framework that learns single-action dialog dynamics.
Our fully supervised learning-based method achieves a solid task success rate of 90.6%, improving 3% compared to the state-of-the-art methods.
arXiv Detail & Related papers (2022-04-25T07:55:53Z) - GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with
Semi-Supervised Learning and Explicit Policy Injection [36.77204909711832]
We propose a novel pre-trained dialog model that explicitly learns dialog policy from limited labeled dialogs and large-scale unlabeled dialog corpora.
Specifically, we introduce a dialog act prediction task for policy optimization during pre-training and employ a consistency regularization term to refine the learned representation.
Empirical results show that GALAXY substantially improves the performance of task-oriented dialog systems.
arXiv Detail & Related papers (2021-11-29T15:24:36Z) - High-Quality Diversification for Task-Oriented Dialogue Systems [18.455916009255485]
Training DRL agents with diverse dialogue trajectories prepare them well for rare user requests and unseen situations.
One effective diversification method is to let the agent interact with a diverse set of learned user models.
We propose a novel dialogue diversification method for task-oriented dialogue systems trained in simulators.
arXiv Detail & Related papers (2021-06-02T02:10:07Z) - LAVA: Latent Action Spaces via Variational Auto-encoding for Dialogue
Policy Optimization [2.78632567955797]
Reinforcement learning can enable task-oriented dialogue systems to steer the conversation towards successful task completion.
In an end-to-end setting, a response can be constructed in a word-level sequential decision making process with the entire system vocabulary as action space.
Current approaches use an uninformed prior for training and optimize the latent distribution solely on the context.
It is therefore unclear whether the latent representation truly encodes the characteristics of different actions.
arXiv Detail & Related papers (2020-11-18T16:23:30Z) - Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward
Decomposition [64.06167416127386]
We propose Multi-Agent Dialog Policy Learning, which regards both the system and the user as the dialog agents.
Two agents interact with each other and are jointly learned simultaneously.
Results show that our method can successfully build a system policy and a user policy simultaneously.
arXiv Detail & Related papers (2020-04-08T04:51:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.