Taming Continuous Posteriors for Latent Variational Dialogue Policies
- URL: http://arxiv.org/abs/2205.07633v1
- Date: Mon, 16 May 2022 12:50:32 GMT
- Title: Taming Continuous Posteriors for Latent Variational Dialogue Policies
- Authors: Marin Vlastelica, Patrick Ernst, Gyuri Szarvas
- Abstract summary: We revisit Gaussian variational posteriors for latent-action RL and show that they can yield even better performance than categoricals.
We achieve this by simplifying the training procedure and propose ways to regularize the latent dialogue policy.
- Score: 1.0312968200748118
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Utilizing amortized variational inference for latent-action reinforcement
learning (RL) has been shown to be an effective approach in Task-oriented
Dialogue (ToD) systems for optimizing dialogue success. Until now, categorical
posteriors have been argued to be one of the main drivers of performance. In
this work we revisit Gaussian variational posteriors for latent-action RL and
show that they can yield even better performance than categoricals. We achieve
this by simplifying the training procedure and propose ways to regularize the
latent dialogue policy to retain good response coherence. Using continuous
latent representations our model achieves state of the art dialogue success
rate on the MultiWOZ benchmark, and also compares well to categorical latent
methods in response coherence.
Related papers
- Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback [71.55265615594669]
We describe an approach for aligning an LLM-based dialogue agent based on global (i.e., dialogue-level) rewards, while also taking into account naturally-occurring multimodal signals.
We run quantitative and qualitative human studies to evaluate the performance of our GELI approach, and find that it shows consistent improvements across various conversational metrics compared to baseline methods.
arXiv Detail & Related papers (2024-03-17T20:21:26Z) - Enhancing Large Language Model Induced Task-Oriented Dialogue Systems
Through Look-Forward Motivated Goals [76.69419538047813]
ProToD approach anticipates the future dialogue actions and incorporates the goal-oriented reward signal to enhance ToD systems.
We present a novel evaluation method that assesses ToD systems based on goal-driven dialogue simulations.
Empirical experiments conducted on the MultiWoZ 2.1 dataset demonstrate that our model can achieve superior performance using only 10% of the data.
arXiv Detail & Related papers (2023-09-16T10:56:00Z) - JoTR: A Joint Transformer and Reinforcement Learning Framework for
Dialog Policy Learning [53.83063435640911]
Dialogue policy learning (DPL) is a crucial component of dialogue modelling.
We introduce a novel framework, JoTR, to generate flexible dialogue actions.
Unlike traditional methods, JoTR formulates a word-level policy that allows for a more dynamic and adaptable dialogue action generation.
arXiv Detail & Related papers (2023-09-01T03:19:53Z) - Deep RL with Hierarchical Action Exploration for Dialogue Generation [0.0]
This paper presents theoretical analysis and experiments that reveal the performance of the dialogue policy is positively correlated with the sampling size.
We introduce a novel dual-granularity Q-function that explores the most promising response category to intervene in the sampling process.
Our algorithm exhibits both explainability and controllability and generates responses with higher expected rewards.
arXiv Detail & Related papers (2023-03-22T09:29:22Z) - Towards Robust Online Dialogue Response Generation [62.99904593650087]
We argue that this can be caused by a discrepancy between training and real-world testing.
We propose a hierarchical sampling-based method consisting of both utterance-level sampling and semi-utterance-level sampling.
arXiv Detail & Related papers (2022-03-07T06:51:41Z) - Imperfect also Deserves Reward: Multi-Level and Sequential Reward
Modeling for Better Dialog Management [17.168214640974337]
For task-oriented dialog systems, training a Reinforcement Learning based Dialog Management module suffers from low sample efficiency and slow convergence speed due to the sparse rewards in RL.
We propose a multi-level reward modeling approach that factorizes a reward into a three-level hierarchy: domain, act, and slot.
arXiv Detail & Related papers (2021-04-10T12:20:23Z) - SUMBT+LaRL: Effective Multi-domain End-to-end Neural Task-oriented
Dialog System [6.73550057218157]
We present an effective multi-domain end-to-end trainable neural dialog system SUMBT+LaRL.
Specifically, the SUMBT+ estimates user-acts as well as dialog belief states, and the LaRL models latent system action spaces and generates responses.
Our model achieved the new state-of-the-art success rate of 85.4% on corpus-based evaluation, and a comparable success rate of 81.40% on simulator-based evaluation.
arXiv Detail & Related papers (2020-09-22T11:02:21Z) - Modelling Hierarchical Structure between Dialogue Policy and Natural
Language Generator with Option Framework for Task-oriented Dialogue System [49.39150449455407]
HDNO is an option framework for designing latent dialogue acts to avoid designing specific dialogue act representations.
We test HDNO on MultiWoz 2.0 and MultiWoz 2.1, the datasets on multi-domain dialogues, in comparison with word-level E2E model trained with RL, LaRL and HDSA.
arXiv Detail & Related papers (2020-06-11T20:55:28Z) - Semi-Supervised Dialogue Policy Learning via Stochastic Reward
Estimation [33.688270031454095]
We introduce reward learning to learn from state-action pairs of an optimal policy to provide turn-by-turn rewards.
This approach requires complete state-action annotations of human-to-human dialogues.
We propose a novel reward learning approach for semi-supervised policy learning.
arXiv Detail & Related papers (2020-05-09T06:28:44Z) - Guided Dialog Policy Learning without Adversarial Learning in the Loop [103.20723982440788]
A number of adversarial learning methods have been proposed to learn the reward function together with the dialogue policy.
We propose to decompose the adversarial training into two steps.
First, we train the discriminator with an auxiliary dialogue generator and then incorporate a derived reward model into a common RL method to guide the dialogue policy learning.
arXiv Detail & Related papers (2020-04-07T11:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.