LAVA: Latent Action Spaces via Variational Auto-encoding for Dialogue
Policy Optimization
- URL: http://arxiv.org/abs/2011.09378v1
- Date: Wed, 18 Nov 2020 16:23:30 GMT
- Title: LAVA: Latent Action Spaces via Variational Auto-encoding for Dialogue
Policy Optimization
- Authors: Nurul Lubis, Christian Geishauser, Michael Heck, Hsien-chin Lin, Marco
Moresi, Carel van Niekerk and Milica Ga\v{s}i\'c
- Abstract summary: Reinforcement learning can enable task-oriented dialogue systems to steer the conversation towards successful task completion.
In an end-to-end setting, a response can be constructed in a word-level sequential decision making process with the entire system vocabulary as action space.
Current approaches use an uninformed prior for training and optimize the latent distribution solely on the context.
It is therefore unclear whether the latent representation truly encodes the characteristics of different actions.
- Score: 2.78632567955797
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning (RL) can enable task-oriented dialogue systems to
steer the conversation towards successful task completion. In an end-to-end
setting, a response can be constructed in a word-level sequential decision
making process with the entire system vocabulary as action space. Policies
trained in such a fashion do not require expert-defined action spaces, but they
have to deal with large action spaces and long trajectories, making RL
impractical. Using the latent space of a variational model as action space
alleviates this problem. However, current approaches use an uninformed prior
for training and optimize the latent distribution solely on the context. It is
therefore unclear whether the latent representation truly encodes the
characteristics of different actions. In this paper, we explore three ways of
leveraging an auxiliary task to shape the latent variable distribution: via
pre-training, to obtain an informed prior, and via multitask learning. We
choose response auto-encoding as the auxiliary task, as this captures the
generative factors of dialogue responses while requiring low computational cost
and neither additional data nor labels. Our approach yields a more
action-characterized latent representations which support end-to-end dialogue
policy optimization and achieves state-of-the-art success rates. These results
warrant a more wide-spread use of RL in end-to-end dialogue models.
Related papers
- Generative Context Distillation [48.91617280112579]
Generative Context Distillation (GCD) is a lightweight prompt internalization method that employs a joint training approach.
We demonstrate that our approach effectively internalizes complex prompts across various agent-based application scenarios.
arXiv Detail & Related papers (2024-11-24T17:32:20Z) - Unsupervised Extraction of Dialogue Policies from Conversations [3.102576158218633]
We show how Large Language Models can be instrumental in extracting dialogue policies from datasets.
We then propose a novel method for generating dialogue policies utilizing a controllable and interpretable graph-based methodology.
arXiv Detail & Related papers (2024-06-21T14:57:25Z) - Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations [70.7884839812069]
Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks.
However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome.
In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
arXiv Detail & Related papers (2023-11-09T18:45:16Z) - Self-Explanation Prompting Improves Dialogue Understanding in Large
Language Models [52.24756457516834]
We propose a novel "Self-Explanation" prompting strategy to enhance the comprehension abilities of Large Language Models (LLMs)
This task-agnostic approach requires the model to analyze each dialogue utterance before task execution, thereby improving performance across various dialogue-centric tasks.
Experimental results from six benchmark datasets confirm that our method consistently outperforms other zero-shot prompts and matches or exceeds the efficacy of few-shot prompts.
arXiv Detail & Related papers (2023-09-22T15:41:34Z) - Dialog Action-Aware Transformer for Dialog Policy Learning [22.262659702998892]
We propose to make full use of the plain text knowledge from the pre-trained language model to accelerate the RL agent's learning speed.
Specifically, we design a dialog action-aware transformer encoder (DaTrans) which integrates a new fine-tuning procedure named masked last action task.
DaTrans is further optimized in an RL setting with ongoing interactions and evolves through exploration in the dialog action space toward maximizing long-term accumulated rewards.
arXiv Detail & Related papers (2023-09-05T13:47:25Z) - JoTR: A Joint Transformer and Reinforcement Learning Framework for
Dialog Policy Learning [53.83063435640911]
Dialogue policy learning (DPL) is a crucial component of dialogue modelling.
We introduce a novel framework, JoTR, to generate flexible dialogue actions.
Unlike traditional methods, JoTR formulates a word-level policy that allows for a more dynamic and adaptable dialogue action generation.
arXiv Detail & Related papers (2023-09-01T03:19:53Z) - DiactTOD: Learning Generalizable Latent Dialogue Acts for Controllable
Task-Oriented Dialogue Systems [15.087619144902776]
We present a novel end-to-end latent dialogue act model (DiactTOD) that represents dialogue acts in a latent space.
When pre-trained on a large corpus, DiactTOD is able to predict and control dialogue acts to generate controllable responses.
arXiv Detail & Related papers (2023-08-01T23:29:16Z) - CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement
Learning [85.3987745097806]
offline reinforcement learning can be used to train dialogue agents entirely using static datasets collected from human speakers.
Experiments show that recently developed offline RL methods can be combined with language models to yield realistic dialogue agents.
arXiv Detail & Related papers (2022-04-18T17:43:21Z) - Generalizable and Explainable Dialogue Generation via Explicit Action
Learning [33.688270031454095]
Conditioned response generation serves as an effective approach to optimize task completion and language quality.
latent action learning is introduced to map each utterance to a latent representation.
This approach is prone to over-dependence on the training data, and the generalization capability is thus restricted.
Our proposed approach outperforms latent action baselines on MultiWOZ, a benchmark multi-domain dataset.
arXiv Detail & Related papers (2020-10-08T04:37:22Z) - Learning an Effective Context-Response Matching Model with
Self-Supervised Tasks for Retrieval-based Dialogues [88.73739515457116]
We introduce four self-supervised tasks including next session prediction, utterance restoration, incoherence detection and consistency discrimination.
We jointly train the PLM-based response selection model with these auxiliary tasks in a multi-task manner.
Experiment results indicate that the proposed auxiliary self-supervised tasks bring significant improvement for multi-turn response selection.
arXiv Detail & Related papers (2020-09-14T08:44:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.