Reward-Driven Interaction: Enhancing Proactive Dialogue Agents through User Satisfaction Prediction
- URL: http://arxiv.org/abs/2505.18731v1
- Date: Sat, 24 May 2025 15:01:30 GMT
- Title: Reward-Driven Interaction: Enhancing Proactive Dialogue Agents through User Satisfaction Prediction
- Authors: Wei Shen, Xiaonan He, Chuheng Zhang, Xuyun Zhang, Xiaolong Xu, Wanchun Dou,
- Abstract summary: We propose two auxiliary tasks to improve the representation learning of user utterances and sessions that enhance user satisfaction prediction.<n>The proposed method is evaluated on DuerOS, demonstrating significant improvements in the accuracy of error recognition on rare user utterances and long-tailed domains.
- Score: 22.105598216923706
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reward-driven proactive dialogue agents require precise estimation of user satisfaction as an intrinsic reward signal to determine optimal interaction strategies. Specifically, this framework triggers clarification questions when detecting potential user dissatisfaction during interactions in the industrial dialogue system. Traditional works typically rely on training a neural network model based on weak labels which are generated by a simple model trained on user actions after current turn. However, existing methods suffer from two critical limitations in real-world scenarios: (1) Noisy Reward Supervision, dependence on weak labels derived from post-hoc user actions introduces bias, particularly failing to capture satisfaction signals in ASR-error-induced utterances; (2) Long-Tail Feedback Sparsity, the power-law distribution of user queries causes reward prediction accuracy to drop in low-frequency domains. The noise in the weak labels and a power-law distribution of user utterances results in that the model is hard to learn good representation of user utterances and sessions. To address these limitations, we propose two auxiliary tasks to improve the representation learning of user utterances and sessions that enhance user satisfaction prediction. The first one is a contrastive self-supervised learning task, which helps the model learn the representation of rare user utterances and identify ASR errors. The second one is a domain-intent classification task, which aids the model in learning the representation of user sessions from long-tailed domains and improving the model's performance on such domains. The proposed method is evaluated on DuerOS, demonstrating significant improvements in the accuracy of error recognition on rare user utterances and long-tailed domains.
Related papers
- Reinforced Interactive Continual Learning via Real-time Noisy Human Feedback [59.768119380109084]
This paper introduces an interactive continual learning paradigm where AI models dynamically learn new skills from real-time human feedback.<n>We propose RiCL, a Reinforced interactive Continual Learning framework leveraging Large Language Models (LLMs)<n>Our RiCL approach substantially outperforms existing combinations of state-of-the-art online continual learning and noisy-label learning methods.
arXiv Detail & Related papers (2025-05-15T03:22:03Z) - Multi-agents based User Values Mining for Recommendation [52.26100802380767]
We propose a zero-shot multi-LLM collaborative framework for effective and accurate user value extraction.<n>We apply text summarization techniques to condense item content while preserving essential meaning.<n>To mitigate hallucinations, we introduce two specialized agent roles: evaluators and supervisors.
arXiv Detail & Related papers (2025-05-02T04:01:31Z) - Reasoning LLMs for User-Aware Multimodal Conversational Agents [3.533721662684487]
Personalization in social robotics is critical for fostering effective human-robot interactions.<n>This paper proposes a novel framework called USER-LLM R1 for a user-aware conversational agent.<n>Our approach integrates chain-of-thought (CoT) reasoning models to iteratively infer user preferences and vision-language models.
arXiv Detail & Related papers (2025-04-02T13:00:17Z) - CAUSE: Counterfactual Assessment of User Satisfaction Estimation in Task-Oriented Dialogue Systems [60.27663010453209]
We leverage large language models (LLMs) to generate satisfaction-aware counterfactual dialogues.
We gather human annotations to ensure the reliability of the generated samples.
Our results shed light on the need for data augmentation approaches for user satisfaction estimation in TOD systems.
arXiv Detail & Related papers (2024-03-27T23:45:31Z) - Toward Practical Automatic Speech Recognition and Post-Processing: a
Call for Explainable Error Benchmark Guideline [12.197453599489963]
We propose the development of an Error Explainable Benchmark (EEB) dataset.
This dataset, while considering both speech- and text-level, enables a granular understanding of the model's shortcomings.
Our proposition provides a structured pathway for a more real-world-centric' evaluation, allowing for the detection and rectification of nuanced system weaknesses.
arXiv Detail & Related papers (2024-01-26T03:42:45Z) - SeGA: Preference-Aware Self-Contrastive Learning with Prompts for
Anomalous User Detection on Twitter [14.483830120541894]
We propose SeGA, preference-aware self-contrastive learning for anomalous user detection.
SeGA uses large language models to summarize user preferences via posts.
We empirically validate the effectiveness of the model design and pre-training strategies.
arXiv Detail & Related papers (2023-12-17T05:35:28Z) - Decoding the Silent Majority: Inducing Belief Augmented Social Graph
with Large Language Model for Response Forecasting [74.68371461260946]
SocialSense is a framework that induces a belief-centered graph on top of an existent social network, along with graph-based propagation to capture social dynamics.
Our method surpasses existing state-of-the-art in experimental evaluations for both zero-shot and supervised settings.
arXiv Detail & Related papers (2023-10-20T06:17:02Z) - Revealing User Familiarity Bias in Task-Oriented Dialogue via Interactive Evaluation [17.41434948048325]
We conduct an interactive user study to unveil how vulnerable TOD systems are against realistic scenarios.
Our study reveals that conversations in open-goal settings lead to catastrophic failures of the system.
We discover a novel "pretending" behavior, in which the system pretends to handle the user requests even though they are beyond the system's capabilities.
arXiv Detail & Related papers (2023-05-23T09:24:53Z) - A Transformer-Based User Satisfaction Prediction for Proactive
Interaction Mechanism in DuerOS [12.060990859604681]
We propose a proactive interaction mechanism where the system predicts the user satisfaction with the candidate response before giving it to the user.
If the user is not likely to be satisfied according to the prediction, the system will ask the user a suitable question to determine the real intent of the user.
We deploy and evaluate our model on DuerOS, and observe a 19% relative improvement on the accuracy of user satisfaction prediction and 2.3% relative improvement on user experience.
arXiv Detail & Related papers (2022-12-05T09:17:49Z) - Distantly-Supervised Named Entity Recognition with Noise-Robust Learning
and Language Model Augmented Self-Training [66.80558875393565]
We study the problem of training named entity recognition (NER) models using only distantly-labeled data.
We propose a noise-robust learning scheme comprised of a new loss function and a noisy label removal step.
Our method achieves superior performance, outperforming existing distantly-supervised NER models by significant margins.
arXiv Detail & Related papers (2021-09-10T17:19:56Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.