Joint System-Wise Optimization for Pipeline Goal-Oriented Dialog System
- URL: http://arxiv.org/abs/2106.04835v1
- Date: Wed, 9 Jun 2021 06:44:57 GMT
- Title: Joint System-Wise Optimization for Pipeline Goal-Oriented Dialog System
- Authors: Zichuan Lin, Jing Huang, Bowen Zhou, Xiaodong He, Tengyu Ma
- Abstract summary: We propose new joint system-wise optimization techniques for the pipeline dialog system.
First, we propose a new data augmentation approach which automates the labeling process for NLU training.
Second, we propose a novel policy parameterization with Poisson distribution that enables better exploration and offers a way to compute policy gradient.
- Score: 76.22810715401147
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work (Takanobu et al., 2020) proposed the system-wise evaluation on
dialog systems and found that improvement on individual components (e.g., NLU,
policy) in prior work may not necessarily bring benefit to pipeline systems in
system-wise evaluation. To improve the system-wise performance, in this paper,
we propose new joint system-wise optimization techniques for the pipeline
dialog system. First, we propose a new data augmentation approach which
automates the labeling process for NLU training. Second, we propose a novel
stochastic policy parameterization with Poisson distribution that enables
better exploration and offers a principled way to compute policy gradient.
Third, we propose a reward bonus to help policy explore successful dialogs. Our
approaches outperform the competitive pipeline systems from Takanobu et al.
(2020) by big margins of 12% success rate in automatic system-wise evaluation
and of 16% success rate in human evaluation on the standard multi-domain
benchmark dataset MultiWOZ 2.1, and also outperform the recent state-of-the-art
end-to-end trained model from DSTC9.
Related papers
- Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback [71.55265615594669]
We describe an approach for aligning an LLM-based dialogue agent based on global (i.e., dialogue-level) rewards, while also taking into account naturally-occurring multimodal signals.
We run quantitative and qualitative human studies to evaluate the performance of our GELI approach, and find that it shows consistent improvements across various conversational metrics compared to baseline methods.
arXiv Detail & Related papers (2024-03-17T20:21:26Z) - Enhancing End-to-End Multi-Task Dialogue Systems: A Study on Intrinsic Motivation Reinforcement Learning Algorithms for Improved Training and Adaptability [1.0985060632689174]
Investigating intrinsic motivation reinforcement learning algorithms is the goal of this study.
We adapt techniques for random network distillation and curiosity-driven reinforcement learning to measure the frequency of state visits.
Experimental results on MultiWOZ, a heterogeneous dataset, show that intrinsic motivation-based debate systems outperform policies that depend on extrinsic incentives.
arXiv Detail & Related papers (2024-01-31T18:03:39Z) - Enhancing Large Language Model Induced Task-Oriented Dialogue Systems
Through Look-Forward Motivated Goals [76.69419538047813]
ProToD approach anticipates the future dialogue actions and incorporates the goal-oriented reward signal to enhance ToD systems.
We present a novel evaluation method that assesses ToD systems based on goal-driven dialogue simulations.
Empirical experiments conducted on the MultiWoZ 2.1 dataset demonstrate that our model can achieve superior performance using only 10% of the data.
arXiv Detail & Related papers (2023-09-16T10:56:00Z) - Two-pass Decoding and Cross-adaptation Based System Combination of
End-to-end Conformer and Hybrid TDNN ASR Systems [61.90743116707422]
This paper investigates multi-pass rescoring and cross adaptation based system combination approaches for hybrid TDNN and Conformer E2E ASR systems.
The best combined system obtained using multi-pass rescoring produced statistically significant word error rate (WER) reductions of 2.5% to 3.9% absolute (22.5% to 28.9% relative) over the stand alone Conformer system on the NIST Hub5'00, Rt03 and Rt02 evaluation data.
arXiv Detail & Related papers (2022-06-23T10:17:13Z) - What are the best systems? New perspectives on NLP Benchmarking [10.27421161397197]
We propose a new procedure to rank systems based on their performance across different tasks.
Motivated by the social choice theory, the final system ordering is obtained through aggregating the rankings induced by each task.
We show that our method yields different conclusions on state-of-the-art systems than the mean-aggregation procedure.
arXiv Detail & Related papers (2022-02-08T11:44:20Z) - DORA: Toward Policy Optimization for Task-oriented Dialogue System with
Efficient Context [3.962145079528281]
We propose a multi-domain task-oriented dialogue system, called Dialogue System with Optimizing a Recurrent Action Policy using Efficient Context (DORA)
DORA is clearly optimized during both SL and RL steps by using an explicit system action policy that considers an efficient context instead of the entire dialogue history.
DORA improved the success rate by 6.6 points on MultiWOZ 2.0 and by 10.9 points on MultiWOZ 2.1.
arXiv Detail & Related papers (2021-07-07T15:24:27Z) - SUMBT+LaRL: Effective Multi-domain End-to-end Neural Task-oriented
Dialog System [6.73550057218157]
We present an effective multi-domain end-to-end trainable neural dialog system SUMBT+LaRL.
Specifically, the SUMBT+ estimates user-acts as well as dialog belief states, and the LaRL models latent system action spaces and generates responses.
Our model achieved the new state-of-the-art success rate of 85.4% on corpus-based evaluation, and a comparable success rate of 81.40% on simulator-based evaluation.
arXiv Detail & Related papers (2020-09-22T11:02:21Z) - Modelling Hierarchical Structure between Dialogue Policy and Natural
Language Generator with Option Framework for Task-oriented Dialogue System [49.39150449455407]
HDNO is an option framework for designing latent dialogue acts to avoid designing specific dialogue act representations.
We test HDNO on MultiWoz 2.0 and MultiWoz 2.1, the datasets on multi-domain dialogues, in comparison with word-level E2E model trained with RL, LaRL and HDSA.
arXiv Detail & Related papers (2020-06-11T20:55:28Z) - Single-step deep reinforcement learning for open-loop control of laminar
and turbulent flows [0.0]
This research gauges the ability of deep reinforcement learning (DRL) techniques to assist the optimization and control of fluid mechanical systems.
It combines a novel, "degenerate" version of the prototypical policy optimization (PPO) algorithm, that trains a neural network in optimizing the system only once per learning episode.
arXiv Detail & Related papers (2020-06-04T16:11:26Z) - PONE: A Novel Automatic Evaluation Metric for Open-Domain Generative
Dialogue Systems [48.99561874529323]
There are three kinds of automatic methods to evaluate the open-domain generative dialogue systems.
Due to the lack of systematic comparison, it is not clear which kind of metrics are more effective.
We propose a novel and feasible learning-based metric that can significantly improve the correlation with human judgments.
arXiv Detail & Related papers (2020-04-06T04:36:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.