End-to-end Spoken Conversational Question Answering: Task, Dataset and
Model
- URL: http://arxiv.org/abs/2204.14272v1
- Date: Fri, 29 Apr 2022 17:56:59 GMT
- Title: End-to-end Spoken Conversational Question Answering: Task, Dataset and
Model
- Authors: Chenyu You, Nuo Chen, Fenglin Liu, Shen Ge, Xian Wu, Yuexian Zou
- Abstract summary: In spoken question answering, the systems are designed to answer questions from contiguous text spans within the related speech transcripts.
We propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling the systems to model complex dialogue flows.
Our main objective is to build the system to deal with conversational questions based on the audio recordings, and to explore the plausibility of providing more cues from different modalities with systems in information gathering.
- Score: 92.18621726802726
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In spoken question answering, the systems are designed to answer questions
from contiguous text spans within the related speech transcripts. However, the
most natural way that human seek or test their knowledge is via human
conversations. Therefore, we propose a new Spoken Conversational Question
Answering task (SCQA), aiming at enabling the systems to model complex dialogue
flows given the speech documents. In this task, our main objective is to build
the system to deal with conversational questions based on the audio recordings,
and to explore the plausibility of providing more cues from different
modalities with systems in information gathering. To this end, instead of
directly adopting automatically generated speech transcripts with highly noisy
data, we propose a novel unified data distillation approach, DDNet, which
effectively ingests cross-modal information to achieve fine-grained
representations of the speech and language modalities. Moreover, we propose a
simple and novel mechanism, termed Dual Attention, by encouraging better
alignments between audio and text to ease the process of knowledge transfer. To
evaluate the capacity of SCQA systems in a dialogue-style interaction, we
assemble a Spoken Conversational Question Answering (Spoken-CoQA) dataset with
more than 40k question-answer pairs from 4k conversations. The performance of
the existing state-of-the-art methods significantly degrade on our dataset,
hence demonstrating the necessity of cross-modal information integration. Our
experimental results demonstrate that our proposed method achieves superior
performance in spoken conversational question answering tasks.
Related papers
- WavChat: A Survey of Spoken Dialogue Models [66.82775211793547]
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o, have captured significant attention in the speech domain.
These advanced spoken dialogue models not only comprehend audio, music, and other speech-related features, but also capture stylistic and timbral characteristics in speech.
Despite the progress in spoken dialogue systems, there is a lack of comprehensive surveys that systematically organize and analyze these systems.
arXiv Detail & Related papers (2024-11-15T04:16:45Z) - FCC: Fusing Conversation History and Candidate Provenance for Contextual
Response Ranking in Dialogue Systems [53.89014188309486]
We present a flexible neural framework that can integrate contextual information from multiple channels.
We evaluate our model on the MSDialog dataset widely used for evaluating conversational response ranking tasks.
arXiv Detail & Related papers (2023-03-31T23:58:28Z) - KETOD: Knowledge-Enriched Task-Oriented Dialogue [77.59814785157877]
Existing studies in dialogue system research mostly treat task-oriented dialogue and chit-chat as separate domains.
We investigate how task-oriented dialogue and knowledge-grounded chit-chat can be effectively integrated into a single model.
arXiv Detail & Related papers (2022-05-11T16:01:03Z) - "How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken
Conversations [87.95711406978157]
This work presents a new benchmark on spoken task-oriented conversations.
We study multi-domain dialogue state tracking and knowledge-grounded dialogue modeling.
Our data set enables speech-based benchmarking of task-oriented dialogue systems.
arXiv Detail & Related papers (2021-09-28T04:51:04Z) - Integrating Dialog History into End-to-End Spoken Language Understanding
Systems [37.08876551722831]
We investigate the importance of dialog history and how it can be effectively integrated into end-to-end spoken language understanding systems.
While processing a spoken utterance, our proposed RNN transducer (RNN-T) based SLU model has access to its dialog history in the form of decoded transcripts and SLU labels of previous turns.
We evaluate our approach on a recently released spoken dialog data set, the HarperValleyBank corpus.
arXiv Detail & Related papers (2021-08-18T22:24:11Z) - Self-supervised Dialogue Learning for Spoken Conversational Question
Answering [29.545937716796082]
In spoken conversational question answering (SCQA), the answer to the corresponding question is generated by retrieving and then analyzing a fixed spoken document, including multi-part conversations.
We introduce a self-supervised learning approach, including incoherence discrimination, insertion detection, and question prediction, to explicitly capture the coreference resolution and dialogue coherence.
Our proposed method provides more coherent, meaningful, and appropriate responses, yielding superior performance gains compared to the original pre-trained language models.
arXiv Detail & Related papers (2021-06-04T00:09:38Z) - BERT-CoQAC: BERT-based Conversational Question Answering in Context [10.811729691130349]
We introduce a framework based on a publically available pre-trained language model called BERT for incorporating history turns into the system.
Experiment results revealed that our framework is comparable in performance with the state-of-the-art models on the QuAC leader board.
arXiv Detail & Related papers (2021-04-23T03:05:17Z) - Towards Data Distillation for End-to-end Spoken Conversational Question
Answering [65.124088336738]
We propose a new Spoken Conversational Question Answering task (SCQA)
SCQA aims at enabling QA systems to model complex dialogues flow given the speech utterances and text corpora.
Our main objective is to build a QA system to deal with conversational questions both in spoken and text forms.
arXiv Detail & Related papers (2020-10-18T05:53:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.