Related papers: Adapting Text-based Dialogue State Tracker for Spoken Dialogues

Adapting Text-based Dialogue State Tracker for Spoken Dialogues

URL: http://arxiv.org/abs/2308.15053v3
Date: Tue, 9 Jan 2024 08:27:48 GMT
Title: Adapting Text-based Dialogue State Tracker for Spoken Dialogues
Authors: Jaeseok Yoon, Seunghyun Hwang, Ran Han, Jeonguk Bang, Kee-Eung Kim
Abstract summary: We describe our engineering effort in building a highly successful model that participated in the speech-aware dialogue systems technology challenge track in DSTC11. Our model consists of three major modules: (1) automatic speech recognition error correction to bridge the gap between the spoken and the text utterances, (2) text-based dialogue system (D3ST) for estimating the slots and values using slot descriptions, and (3) post-processing for recovering the error of the estimated slot value.
Score: 20.139351605832665
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Although there have been remarkable advances in dialogue systems through the dialogue systems technology competition (DSTC), it remains one of the key challenges to building a robust task-oriented dialogue system with a speech interface. Most of the progress has been made for text-based dialogue systems since there are abundant datasets with written corpora while those with spoken dialogues are very scarce. However, as can be seen from voice assistant systems such as Siri and Alexa, it is of practical importance to transfer the success to spoken dialogues. In this paper, we describe our engineering effort in building a highly successful model that participated in the speech-aware dialogue systems technology challenge track in DSTC11. Our model consists of three major modules: (1) automatic speech recognition error correction to bridge the gap between the spoken and the text utterances, (2) text-based dialogue system (D3ST) for estimating the slots and values using slot descriptions, and (3) post-processing for recovering the error of the estimated slot value. Our experiments show that it is important to use an explicit automatic speech recognition error correction module, post-processing, and data augmentation to adapt a text-based dialogue state tracker for spoken dialogue corpora.

Related papers

WavChat: A Survey of Spoken Dialogue Models [66.82775211793547]
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o, have captured significant attention in the speech domain. These advanced spoken dialogue models not only comprehend audio, music, and other speech-related features, but also capture stylistic and timbral characteristics in speech. Despite the progress in spoken dialogue systems, there is a lack of comprehensive surveys that systematically organize and analyze these systems.
arXiv Detail & Related papers (2024-11-15T04:16:45Z)
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation [55.043492250775294]
We introduce a novel Face-to-Face spoken dialogue model. It processes audio-visual speech from user input and generates audio-visual speech as the response. We also introduce MultiDialog, the first large-scale multimodal spoken dialogue corpus.
arXiv Detail & Related papers (2024-06-12T04:48:36Z)
Are cascade dialogue state tracking models speaking out of turn in spoken dialogues? [1.786898113631979]
This paper proposes a comprehensive analysis of the errors of state of the art systems in complex settings such as Dialogue State Tracking. Based on spoken MultiWoz, we identify that errors on non-categorical slots' values are essential to address in order to bridge the gap between spoken and chat-based dialogue systems.
arXiv Detail & Related papers (2023-11-03T08:45:22Z)
TOD-DA: Towards Boosting the Robustness of Task-oriented Dialogue Modeling on Spoken Conversations [24.245354500835465]
We propose a novel model-agnostic data augmentation paradigm to boost the robustness of task-oriented dialogue modeling on spoken conversations. Our approach ranked first in both tasks of DSTC10 Track2, a benchmark for task-oriented dialogue modeling on spoken conversations.
arXiv Detail & Related papers (2021-12-23T10:04:25Z)
UniDS: A Unified Dialogue System for Chit-Chat and Task-oriented Dialogues [59.499965460525694]
We propose a unified dialogue system (UniDS) with the two aforementioned skills. We design a unified dialogue data schema, compatible for both chit-chat and task-oriented dialogues. We train UniDS with mixed dialogue data from a pretrained chit-chat dialogue model.
arXiv Detail & Related papers (2021-10-15T11:56:47Z)
"How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken Conversations [87.95711406978157]
This work presents a new benchmark on spoken task-oriented conversations. We study multi-domain dialogue state tracking and knowledge-grounded dialogue modeling. Our data set enables speech-based benchmarking of task-oriented dialogue systems.
arXiv Detail & Related papers (2021-09-28T04:51:04Z)
Rethinking Dialogue State Tracking with Reasoning [76.0991910623001]
This paper proposes to track dialogue states gradually with reasoning over dialogue turns with the help of the back-end data. Empirical results demonstrate that our method significantly outperforms the state-of-the-art methods by 38.6% in terms of joint belief accuracy for MultiWOZ 2.1.
arXiv Detail & Related papers (2020-05-27T02:05:33Z)
TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue [113.45485470103762]
In this work, we unify nine human-human and multi-turn task-oriented dialogue datasets for language modeling. To better model dialogue behavior during pre-training, we incorporate user and system tokens into the masked language modeling.
arXiv Detail & Related papers (2020-04-15T04:09:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.