Chitchat as Interference: Adding User Backstories to Task-Oriented Dialogues
- URL: http://arxiv.org/abs/2402.15248v3
- Date: Fri, 28 Jun 2024 10:27:11 GMT
- Title: Chitchat as Interference: Adding User Backstories to Task-Oriented Dialogues
- Authors: Armand Stricker, Patrick Paroubek,
- Abstract summary: We use few-shot prompting with Llama-2-70B to enhance the MultiWOZ dataset with user backstories.
We test two models: one trained solely on TODs and another trained on TODs with a preliminary chitchat interaction.
Our dataset can be effectively used for training purposes, enabling a system to consistently acknowledge the user's backstory.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: During task-oriented dialogues (TODs), human users naturally introduce chitchat that is beyond the immediate scope of the task, interfering with the flow of the conversation. To address this issue without the need for expensive manual data creation, we use few-shot prompting with Llama-2-70B to enhance the MultiWOZ dataset with user backstories, a typical example of chitchat interference in TODs. We assess the impact of this addition by testing two models: one trained solely on TODs and another trained on TODs with a preliminary chitchat interaction. Our analysis demonstrates that our enhanced dataset poses a challenge for these systems. Moreover, we demonstrate that our dataset can be effectively used for training purposes, enabling a system to consistently acknowledge the user's backstory while also successfully moving the task forward in the same turn, as confirmed by human evaluation. These findings highlight the benefits of generating novel chitchat-TOD scenarios to test TOD systems more thoroughly and improve their resilience to natural user interferences
Related papers
- Proactive User Information Acquisition via Chats on User-Favored Topics [3.6698472838681893]
This study proposes the PIVOT task, designed to advance the technical foundation for these systems.
We found that even recent large language models (LLMs) show a low success rate in the PIVOT task.
We developed a simple but effective system for this task by incorporating insights obtained through the analysis of this dataset.
arXiv Detail & Related papers (2025-04-10T12:32:16Z) - REALTALK: A 21-Day Real-World Dataset for Long-Term Conversation [51.97224538045096]
We introduce REALTALK, a 21-day corpus of authentic messaging app dialogues.
We compare EI attributes and persona consistency to understand the challenges posed by real-world dialogues.
Our findings reveal that models struggle to simulate a user solely from dialogue history, while fine-tuning on specific user chats improves persona emulation.
arXiv Detail & Related papers (2025-02-18T20:29:01Z) - Simulating User Agents for Embodied Conversational-AI [9.402740034754455]
We build a large language model (LLM)-based user agent that can simulate user behavior during interactions with an embodied agent.
We evaluate our user agent's ability to generate human-like behaviors by comparing its simulated dialogues with the TEACh dataset.
arXiv Detail & Related papers (2024-10-31T00:56:08Z) - CAUSE: Counterfactual Assessment of User Satisfaction Estimation in Task-Oriented Dialogue Systems [60.27663010453209]
We leverage large language models (LLMs) to generate satisfaction-aware counterfactual dialogues.
We gather human annotations to ensure the reliability of the generated samples.
Our results shed light on the need for data augmentation approaches for user satisfaction estimation in TOD systems.
arXiv Detail & Related papers (2024-03-27T23:45:31Z) - Enhancing Large Language Model Induced Task-Oriented Dialogue Systems
Through Look-Forward Motivated Goals [76.69419538047813]
ProToD approach anticipates the future dialogue actions and incorporates the goal-oriented reward signal to enhance ToD systems.
We present a novel evaluation method that assesses ToD systems based on goal-driven dialogue simulations.
Empirical experiments conducted on the MultiWoZ 2.1 dataset demonstrate that our model can achieve superior performance using only 10% of the data.
arXiv Detail & Related papers (2023-09-16T10:56:00Z) - From Chatter to Matter: Addressing Critical Steps of Emotion Recognition
Learning in Task-oriented Dialogue [6.918298428336528]
We propose a framework that turns a chit-chat ERC model into a task-oriented one.
We use dialogue states as auxiliary features to incorporate key information from the goal of the user.
Our framework yields significant improvements for a range of chit-chat ERC models on EmoWOZ.
arXiv Detail & Related papers (2023-08-24T08:46:30Z) - Is MultiWOZ a Solved Task? An Interactive TOD Evaluation Framework with
User Simulator [37.590563896382456]
We propose an interactive evaluation framework for Task-Oriented Dialogue (TOD) systems.
We first build a goal-oriented user simulator based on pre-trained models and then use the user simulator to interact with the dialogue system to generate dialogues.
Experimental results show that RL-based TOD systems trained by our proposed user simulator can achieve nearly 98% inform and success rates.
arXiv Detail & Related papers (2022-10-26T07:41:32Z) - MCP: Self-supervised Pre-training for Personalized Chatbots with
Multi-level Contrastive Sampling [18.40883902610959]
We propose a self-supervised learning framework for capturing better representations from users' dialogue history for personalized chatbots.
Specifically, we apply contrastive sampling methods to leverage the supervised signals hidden in user dialog history.
Experimental results on two real-world datasets show a significant improvement in our proposed model MCP compared with the existing methods.
arXiv Detail & Related papers (2022-10-17T05:16:23Z) - Information Extraction and Human-Robot Dialogue towards Real-life Tasks:
A Baseline Study with the MobileCS Dataset [52.22314870976088]
The SereTOD challenge is organized and releases the MobileCS dataset, which consists of real-world dialog transcripts between real users and customer-service staffs from China Mobile.
Based on the MobileCS dataset, the SereTOD challenge has two tasks, not only evaluating the construction of the dialogue system itself, but also examining information extraction from dialog transcripts.
This paper mainly presents a baseline study of the two tasks with the MobileCS dataset.
arXiv Detail & Related papers (2022-09-27T15:30:43Z) - Interactive Evaluation of Dialog Track at DSTC9 [8.2208199207543]
The Interactive Evaluation of Dialog Track was introduced at the 9th Dialog System Technology Challenge.
This paper provides an overview of the track, including the methodology and results.
arXiv Detail & Related papers (2022-07-28T22:54:04Z) - Duplex Conversation: Towards Human-like Interaction in Spoken Dialogue
System [120.70726465994781]
multimodal spoken dialogue system enables telephonebased agents to interact with customers like human.
We deploy Conversation Duplex Alibaba intelligent customer service to share lessons learned in production.
Online A/B experiments show in proposed system can significantly reduce response latency by 50%.
arXiv Detail & Related papers (2022-05-30T12:41:23Z) - KETOD: Knowledge-Enriched Task-Oriented Dialogue [77.59814785157877]
Existing studies in dialogue system research mostly treat task-oriented dialogue and chit-chat as separate domains.
We investigate how task-oriented dialogue and knowledge-grounded chit-chat can be effectively integrated into a single model.
arXiv Detail & Related papers (2022-05-11T16:01:03Z) - User Satisfaction Estimation with Sequential Dialogue Act Modeling in
Goal-oriented Conversational Systems [65.88679683468143]
We propose a novel framework, namely USDA, to incorporate the sequential dynamics of dialogue acts for predicting user satisfaction.
USDA incorporates the sequential transitions of both content and act features in the dialogue to predict the user satisfaction.
Experimental results on four benchmark goal-oriented dialogue datasets show that the proposed method substantially and consistently outperforms existing methods on USE.
arXiv Detail & Related papers (2022-02-07T02:50:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.