On the Use of Audio to Improve Dialogue Policies
- URL: http://arxiv.org/abs/2410.13385v1
- Date: Thu, 17 Oct 2024 09:37:20 GMT
- Title: On the Use of Audio to Improve Dialogue Policies
- Authors: Daniel Roncel, Federico Costa, Javier Hernando,
- Abstract summary: We propose new architectures to add audio information by combining speech and text embeddings.
Experiments show that audio embedding-aware dialogue policies outperform text-based ones.
- Score: 9.35212661749004
- License:
- Abstract: With the significant progress of speech technologies, spoken goal-oriented dialogue systems are becoming increasingly popular. One of the main modules of a dialogue system is typically the dialogue policy, which is responsible for determining system actions. This component usually relies only on audio transcriptions, being strongly dependent on their quality and ignoring very important extralinguistic information embedded in the user's speech. In this paper, we propose new architectures to add audio information by combining speech and text embeddings using a Double Multi-Head Attention component. Our experiments show that audio embedding-aware dialogue policies outperform text-based ones, particularly in noisy transcription scenarios, and that how text and audio embeddings are combined is crucial to improve performance. We obtained a 9.8% relative improvement in the User Request Score compared to an only-text-based dialogue system on the DSTC2 dataset.
Related papers
- Adapting Text-based Dialogue State Tracker for Spoken Dialogues [20.139351605832665]
We describe our engineering effort in building a highly successful model that participated in the speech-aware dialogue systems technology challenge track in DSTC11.
Our model consists of three major modules: (1) automatic speech recognition error correction to bridge the gap between the spoken and the text utterances, (2) text-based dialogue system (D3ST) for estimating the slots and values using slot descriptions, and (3) post-processing for recovering the error of the estimated slot value.
arXiv Detail & Related papers (2023-08-29T06:27:58Z) - FCC: Fusing Conversation History and Candidate Provenance for Contextual
Response Ranking in Dialogue Systems [53.89014188309486]
We present a flexible neural framework that can integrate contextual information from multiple channels.
We evaluate our model on the MSDialog dataset widely used for evaluating conversational response ranking tasks.
arXiv Detail & Related papers (2023-03-31T23:58:28Z) - A Benchmark for Understanding and Generating Dialogue between Characters
in Stories [75.29466820496913]
We present the first study to explore whether machines can understand and generate dialogue in stories.
We propose two new tasks including Masked Dialogue Generation and Dialogue Speaker Recognition.
We show the difficulty of the proposed tasks by testing existing models with automatic and manual evaluation on DialStory.
arXiv Detail & Related papers (2022-09-18T10:19:04Z) - End-to-end Spoken Conversational Question Answering: Task, Dataset and
Model [92.18621726802726]
In spoken question answering, the systems are designed to answer questions from contiguous text spans within the related speech transcripts.
We propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling the systems to model complex dialogue flows.
Our main objective is to build the system to deal with conversational questions based on the audio recordings, and to explore the plausibility of providing more cues from different modalities with systems in information gathering.
arXiv Detail & Related papers (2022-04-29T17:56:59Z) - Attentive Contextual Carryover for Multi-Turn End-to-End Spoken Language
Understanding [14.157311972146692]
We propose a contextual E2E SLU model architecture that uses a multi-head attention mechanism over encoded previous utterances and dialogue acts.
Our method reduces average word and semantic error rates by 10.8% and 12.6%, respectively.
arXiv Detail & Related papers (2021-12-13T15:49:36Z) - UniDS: A Unified Dialogue System for Chit-Chat and Task-oriented
Dialogues [59.499965460525694]
We propose a unified dialogue system (UniDS) with the two aforementioned skills.
We design a unified dialogue data schema, compatible for both chit-chat and task-oriented dialogues.
We train UniDS with mixed dialogue data from a pretrained chit-chat dialogue model.
arXiv Detail & Related papers (2021-10-15T11:56:47Z) - "How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken
Conversations [87.95711406978157]
This work presents a new benchmark on spoken task-oriented conversations.
We study multi-domain dialogue state tracking and knowledge-grounded dialogue modeling.
Our data set enables speech-based benchmarking of task-oriented dialogue systems.
arXiv Detail & Related papers (2021-09-28T04:51:04Z) - Hierarchical Summarization for Longform Spoken Dialog [1.995792341399967]
Despite the pervasiveness of spoken dialog, automated speech understanding and quality information extraction remains markedly poor.
Compared to understanding text, auditory communication poses many additional challenges such as speaker disfluencies, informal prose styles, and lack of structure.
We propose a two stage ASR and text summarization pipeline and propose a set of semantic segmentation and merging algorithms to resolve these speech modeling challenges.
arXiv Detail & Related papers (2021-08-21T23:31:31Z) - Integrating Dialog History into End-to-End Spoken Language Understanding
Systems [37.08876551722831]
We investigate the importance of dialog history and how it can be effectively integrated into end-to-end spoken language understanding systems.
While processing a spoken utterance, our proposed RNN transducer (RNN-T) based SLU model has access to its dialog history in the form of decoded transcripts and SLU labels of previous turns.
We evaluate our approach on a recently released spoken dialog data set, the HarperValleyBank corpus.
arXiv Detail & Related papers (2021-08-18T22:24:11Z) - Domain State Tracking for a Simplified Dialogue System [3.962145079528281]
We present DoTS, a task-oriented dialogue system that uses a simplified input context instead of the entire dialogue history.
DoTS improves the inform rate and success rate by 1.09 points and 1.24 points, respectively, compared to the previous state-of-the-art model on MultiWOZ.
arXiv Detail & Related papers (2021-03-11T13:00:54Z) - Rethinking Dialogue State Tracking with Reasoning [76.0991910623001]
This paper proposes to track dialogue states gradually with reasoning over dialogue turns with the help of the back-end data.
Empirical results demonstrate that our method significantly outperforms the state-of-the-art methods by 38.6% in terms of joint belief accuracy for MultiWOZ 2.1.
arXiv Detail & Related papers (2020-05-27T02:05:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.