OLISIA: a Cascade System for Spoken Dialogue State Tracking
- URL: http://arxiv.org/abs/2304.11073v3
- Date: Thu, 31 Aug 2023 08:51:33 GMT
- Title: OLISIA: a Cascade System for Spoken Dialogue State Tracking
- Authors: L\'eo Jacqmin, Lucas Druart (LIA), Yannick Est\`eve (LIA), Beno\^it
Favre, Lina Maria Rojas-Barahona, Valentin Vielzeuf
- Abstract summary: OLISIA is a cascade system which integrates an Automatic Speech Recognition (ASR) model and a Dialogue State Tracking (DST) model.
We introduce several adaptations in the ASR and DST modules to improve integration and robustness to spoken conversations.
We conduct an in-depth analysis of the results and find that normalizing the ASR outputs and adapting the DST inputs through data augmentation, along with increasing the pre-trained models size all play an important role in reducing the performance discrepancy between written and spoken conversations.
- Score: 1.6655682083533425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Though Dialogue State Tracking (DST) is a core component of spoken dialogue
systems, recent work on this task mostly deals with chat corpora, disregarding
the discrepancies between spoken and written language.In this paper, we propose
OLISIA, a cascade system which integrates an Automatic Speech Recognition (ASR)
model and a DST model. We introduce several adaptations in the ASR and DST
modules to improve integration and robustness to spoken conversations.With
these adaptations, our system ranked first in DSTC11 Track 3, a benchmark to
evaluate spoken DST. We conduct an in-depth analysis of the results and find
that normalizing the ASR outputs and adapting the DST inputs through data
augmentation, along with increasing the pre-trained models size all play an
important role in reducing the performance discrepancy between written and
spoken conversations.
Related papers
- WavChat: A Survey of Spoken Dialogue Models [66.82775211793547]
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o, have captured significant attention in the speech domain.
These advanced spoken dialogue models not only comprehend audio, music, and other speech-related features, but also capture stylistic and timbral characteristics in speech.
Despite the progress in spoken dialogue systems, there is a lack of comprehensive surveys that systematically organize and analyze these systems.
arXiv Detail & Related papers (2024-11-15T04:16:45Z) - Injecting linguistic knowledge into BERT for Dialogue State Tracking [60.42231674887294]
This paper proposes a method that extracts linguistic knowledge via an unsupervised framework.
We then utilize this knowledge to augment BERT's performance and interpretability in Dialogue State Tracking (DST) tasks.
We benchmark this framework on various DST tasks and observe a notable improvement in accuracy.
arXiv Detail & Related papers (2023-11-27T08:38:42Z) - S3-DST: Structured Open-Domain Dialogue Segmentation and State Tracking
in the Era of LLMs [22.319211779438934]
The advent of Large Language Model (LLM)-based chat systems has introduced many real-world intricacies in open-domain dialogues.
We propose joint dialogue segmentation and state tracking per segment in open-domain dialogue systems.
We evaluate S3-DST on a proprietary anonymized open-domain dialogue dataset, as well as publicly available DST and segmentation datasets.
arXiv Detail & Related papers (2023-09-16T00:59:23Z) - DiariST: Streaming Speech Translation with Speaker Diarization [53.595990270899414]
We propose DiariST, the first streaming ST and SD solution.
It is built upon a neural transducer-based streaming ST system and integrates token-level serialized output training and t-vector.
Our system achieves a strong ST and SD capability compared to offline systems based on Whisper, while performing streaming inference for overlapping speech.
arXiv Detail & Related papers (2023-09-14T19:33:27Z) - Adapting Text-based Dialogue State Tracker for Spoken Dialogues [20.139351605832665]
We describe our engineering effort in building a highly successful model that participated in the speech-aware dialogue systems technology challenge track in DSTC11.
Our model consists of three major modules: (1) automatic speech recognition error correction to bridge the gap between the spoken and the text utterances, (2) text-based dialogue system (D3ST) for estimating the slots and values using slot descriptions, and (3) post-processing for recovering the error of the estimated slot value.
arXiv Detail & Related papers (2023-08-29T06:27:58Z) - KILDST: Effective Knowledge-Integrated Learning for Dialogue State
Tracking using Gazetteer and Speaker Information [3.342637296393915]
Dialogue State Tracking (DST) is core research in dialogue systems and has received much attention.
It is necessary to define a new problem that can deal with dialogue between users as a step toward the conversational AI that extracts and recommends information from the dialogue between users.
We introduce a new task - DST from dialogue between users about scheduling an event (DST-S)
The DST-S task is much more challenging since it requires the model to understand and track dialogue in the dialogue between users and to understand who suggested the schedule and who agreed to the proposed schedule.
arXiv Detail & Related papers (2023-01-18T07:11:56Z) - In-Context Learning for Few-Shot Dialogue State Tracking [55.91832381893181]
We propose an in-context (IC) learning framework for few-shot dialogue state tracking (DST)
A large pre-trained language model (LM) takes a test instance and a few annotated examples as input, and directly decodes the dialogue states without any parameter updates.
This makes the LM more flexible and scalable compared to prior few-shot DST work when adapting to new domains and scenarios.
arXiv Detail & Related papers (2022-03-16T11:58:24Z) - Adapting Document-Grounded Dialog Systems to Spoken Conversations using
Data Augmentation and a Noisy Channel Model [46.93744191416991]
This paper summarizes our submission to Task 2 of the 10th Dialog System Technology Challenge (DSTC10) "Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations"
Similar to the previous year's iteration, the task consists of three subtasks: detecting whether a turn is knowledge seeking, selecting the relevant knowledge document and finally generating a grounded response.
Our best system achieved the 1st rank in the automatic and the 3rd rank in the human evaluation of the challenge.
arXiv Detail & Related papers (2021-12-16T12:51:52Z) - Improving Longer-range Dialogue State Tracking [22.606650177804966]
Dialogue state tracking (DST) is a pivotal component in task-oriented dialogue systems.
In this paper, we aim to improve the overall performance of DST with a special focus on handling longer dialogues.
arXiv Detail & Related papers (2021-02-27T02:44:28Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.