Improving Interaction Quality Estimation with BiLSTMs and the Impact on
Dialogue Policy Learning
- URL: http://arxiv.org/abs/2001.07615v1
- Date: Tue, 21 Jan 2020 15:39:12 GMT
- Title: Improving Interaction Quality Estimation with BiLSTMs and the Impact on
Dialogue Policy Learning
- Authors: Stefan Ultes
- Abstract summary: We propose a novel reward based on user satisfaction estimation.
We show that it outperforms all previous estimators while learning temporal dependencies implicitly.
We show that applying this model results in higher estimated satisfaction, similar task success rates and a higher robustness to noise.
- Score: 0.6538911223040175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning suitable and well-performing dialogue behaviour in statistical
spoken dialogue systems has been in the focus of research for many years. While
most work which is based on reinforcement learning employs an objective measure
like task success for modelling the reward signal, we use a reward based on
user satisfaction estimation. We propose a novel estimator and show that it
outperforms all previous estimators while learning temporal dependencies
implicitly. Furthermore, we apply this novel user satisfaction estimation model
live in simulated experiments where the satisfaction estimation model is
trained on one domain and applied in many other domains which cover a similar
task. We show that applying this model results in higher estimated
satisfaction, similar task success rates and a higher robustness to noise.
Related papers
- CAUSE: Counterfactual Assessment of User Satisfaction Estimation in Task-Oriented Dialogue Systems [60.27663010453209]
We leverage large language models (LLMs) to generate satisfaction-aware counterfactual dialogues.
We gather human annotations to ensure the reliability of the generated samples.
Our results shed light on the need for data augmentation approaches for user satisfaction estimation in TOD systems.
arXiv Detail & Related papers (2024-03-27T23:45:31Z) - MultiPA: A Multi-task Speech Pronunciation Assessment Model for Open Response Scenarios [26.852744399985475]
Pronunciation assessment models enable users to practice language skills in a manner similar to real-life communication.
We propose MultiPA, a Multitask Pronunciation Assessment model that provides sentence-level accuracy, fluency, prosody, and word-level accuracy assessment for open responses.
arXiv Detail & Related papers (2023-08-24T01:24:09Z) - Unlocking the Potential of User Feedback: Leveraging Large Language
Model as User Simulator to Enhance Dialogue System [65.93577256431125]
We propose an alternative approach called User-Guided Response Optimization (UGRO) to combine it with a smaller task-oriented dialogue model.
This approach uses LLM as annotation-free user simulator to assess dialogue responses, combining them with smaller fine-tuned end-to-end TOD models.
Our approach outperforms previous state-of-the-art (SOTA) results.
arXiv Detail & Related papers (2023-06-16T13:04:56Z) - Modeling User Satisfaction Dynamics in Dialogue via Hawkes Process [17.477718698071424]
We propose a new estimator that treats user satisfaction across turns as an event sequence and employs a Hawkes process to effectively model the dynamics in this sequence.
Experimental results on four benchmark dialogue datasets demonstrate that ASAP can substantially outperform state-of-the-art baseline estimators.
arXiv Detail & Related papers (2023-05-21T23:04:14Z) - An Information-Theoretic Approach for Estimating Scenario Generalization
in Crowd Motion Prediction [27.10815774845461]
We propose a novel scoring method, which characterizes generalization of models trained on source crowd scenarios and applied to target crowd scenarios.
The Interaction component aims to characterize the difficulty of scenario domains, while the diversity of a scenario domain is captured in the Diversity score.
Our experimental results validate the efficacy of the proposed method on several simulated and real-world (source,target) generalization tasks.
arXiv Detail & Related papers (2022-11-02T01:39:30Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy
Evaluation Approach [84.02388020258141]
We propose a new framework named ENIGMA for estimating human evaluation scores based on off-policy evaluation in reinforcement learning.
ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation.
Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
arXiv Detail & Related papers (2021-02-20T03:29:20Z) - Is the User Enjoying the Conversation? A Case Study on the Impact on the
Reward Function [0.0]
We adopt deep neural networks that use distributed semantic representation learning for estimating user satisfaction in conversations.
We show that the proposed hierarchical network outperforms state-of-the-art quality estimators.
Applying these networks to infer the reward function in a Partial Observable Markov Decision Process yields to a great improvement in the task success rate.
arXiv Detail & Related papers (2021-01-13T11:13:07Z) - On the model-based stochastic value gradient for continuous
reinforcement learning [50.085645237597056]
We show that simple model-based agents can outperform state-of-the-art model-free agents in terms of both sample-efficiency and final reward.
Our findings suggest that model-based policy evaluation deserves closer attention.
arXiv Detail & Related papers (2020-08-28T17:58:29Z) - Human Trajectory Forecasting in Crowds: A Deep Learning Perspective [89.4600982169]
We present an in-depth analysis of existing deep learning-based methods for modelling social interactions.
We propose two knowledge-based data-driven methods to effectively capture these social interactions.
We develop a large scale interaction-centric benchmark TrajNet++, a significant yet missing component in the field of human trajectory forecasting.
arXiv Detail & Related papers (2020-07-07T17:19:56Z) - Sample-Efficient Model-based Actor-Critic for an Interactive Dialogue
Task [27.896714528986855]
We present a model-based reinforcement learning for an interactive dialogue task.
We build on commonly used actor-critic methods, adding an environment model and planner that augments a learning agent to learn.
Our results show that, on a simulation that mimics the interactive task our algorithm requires 70 times fewer samples, compared to the baseline of commonly used model-free algorithm.
arXiv Detail & Related papers (2020-04-28T17:00:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.