Are LLMs Robust for Spoken Dialogues?
- URL: http://arxiv.org/abs/2401.02297v1
- Date: Thu, 4 Jan 2024 14:36:38 GMT
- Title: Are LLMs Robust for Spoken Dialogues?
- Authors: Seyed Mahed Mousavi, Gabriel Roccabruna, Simone Alghisi, Massimo
Rizzoli, Mirco Ravanelli, Giuseppe Riccardi
- Abstract summary: Large Pre-Trained Language Models have demonstrated state-of-the-art performance in different downstream tasks.
Most of the publicly available datasets and benchmarks on task-oriented dialogues focus on written conversations.
We have evaluated the performance of LLMs for spoken task-oriented dialogues on the DSTC11 test sets.
- Score: 10.855403629160921
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large Pre-Trained Language Models have demonstrated state-of-the-art
performance in different downstream tasks, including dialogue state tracking
and end-to-end response generation. Nevertheless, most of the publicly
available datasets and benchmarks on task-oriented dialogues focus on written
conversations. Consequently, the robustness of the developed models to spoken
interactions is unknown. In this work, we have evaluated the performance of
LLMs for spoken task-oriented dialogues on the DSTC11 test sets. Due to the
lack of proper spoken dialogue datasets, we have automatically transcribed a
development set of spoken dialogues with a state-of-the-art ASR engine. We have
characterized the ASR-error types and their distributions and simulated these
errors in a large dataset of dialogues. We report the intrinsic (perplexity)
and extrinsic (human evaluation) performance of fine-tuned GPT-2 and T5 models
in two subtasks of response generation and dialogue state tracking,
respectively. The results show that LLMs are not robust to spoken noise by
default, however, fine-tuning/training such models on a proper dataset of
spoken TODs can result in a more robust performance.
Related papers
- Enhancing Dialogue State Tracking Models through LLM-backed User-Agents Simulation [12.93942316816741]
GPT-4 is used to simulate the user and agent interaction, generating thousands of annotated dialogues with DST labels.
A two-stage fine-tuning on LLaMA 2 is performed on the generated data and the real data for the DST prediction.
Our approach is also capable of adapting to the dynamic demands in real-world scenarios, generating dialogues in new domains swiftly.
arXiv Detail & Related papers (2024-05-17T07:00:05Z) - SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented
Dialogue Agents [72.42049370297849]
SpokenWOZ is a large-scale speech-text dataset for spoken TOD.
Cross-turn slot and reasoning slot detection are new challenges for SpokenWOZ.
arXiv Detail & Related papers (2023-05-22T13:47:51Z) - In-Context Learning for Few-Shot Dialogue State Tracking [55.91832381893181]
We propose an in-context (IC) learning framework for few-shot dialogue state tracking (DST)
A large pre-trained language model (LM) takes a test instance and a few annotated examples as input, and directly decodes the dialogue states without any parameter updates.
This makes the LM more flexible and scalable compared to prior few-shot DST work when adapting to new domains and scenarios.
arXiv Detail & Related papers (2022-03-16T11:58:24Z) - Dialogue Summaries as Dialogue States (DS2), Template-Guided
Summarization for Few-shot Dialogue State Tracking [16.07100713414678]
Few-shot dialogue state tracking (DST) is a realistic solution to this problem.
We propose to reformulate dialogue state tracking as a dialogue summarization problem.
arXiv Detail & Related papers (2022-03-03T07:54:09Z) - TOD-DA: Towards Boosting the Robustness of Task-oriented Dialogue
Modeling on Spoken Conversations [24.245354500835465]
We propose a novel model-agnostic data augmentation paradigm to boost the robustness of task-oriented dialogue modeling on spoken conversations.
Our approach ranked first in both tasks of DSTC10 Track2, a benchmark for task-oriented dialogue modeling on spoken conversations.
arXiv Detail & Related papers (2021-12-23T10:04:25Z) - "How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken
Conversations [87.95711406978157]
This work presents a new benchmark on spoken task-oriented conversations.
We study multi-domain dialogue state tracking and knowledge-grounded dialogue modeling.
Our data set enables speech-based benchmarking of task-oriented dialogue systems.
arXiv Detail & Related papers (2021-09-28T04:51:04Z) - Video-Grounded Dialogues with Pretrained Generation Language Models [88.15419265622748]
We leverage the power of pre-trained language models for improving video-grounded dialogue.
We propose a framework by formulating sequence-to-grounded dialogue tasks as a sequence-to-grounded task.
Our framework allows fine-tuning language models to capture dependencies across multiple modalities.
arXiv Detail & Related papers (2020-06-27T08:24:26Z) - Paraphrase Augmented Task-Oriented Dialog Generation [68.1790912977053]
We propose a paraphrase augmented response generation (PARG) framework that jointly trains a paraphrase model and a response generation model.
We also design a method to automatically construct paraphrase training data set based on dialog state and dialog act labels.
arXiv Detail & Related papers (2020-04-16T05:12:36Z) - TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented
Dialogue [113.45485470103762]
In this work, we unify nine human-human and multi-turn task-oriented dialogue datasets for language modeling.
To better model dialogue behavior during pre-training, we incorporate user and system tokens into the masked language modeling.
arXiv Detail & Related papers (2020-04-15T04:09:05Z) - Interview: A Large-Scale Open-Source Corpus of Media Dialog [11.28504775964698]
We introduce 'Interview': a large-scale (105K conversations) media dialog dataset collected from news interview transcripts.
Compared to existing large-scale proxies for conversational data, language models trained on our dataset exhibit better zero-shot out-of-domain performance.
'Interview' contains speaker role annotations for each turn, facilitating the development of engaging, responsive dialog systems.
arXiv Detail & Related papers (2020-04-07T02:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.