SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering
- URL: http://arxiv.org/abs/2401.13463v3
- Date: Sat, 24 Aug 2024 20:28:38 GMT
- Title: SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering
- Authors: Chyi-Jiunn Lin, Guan-Ting Lin, Yung-Sung Chuang, Wei-Lun Wu, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Lin-shan Lee,
- Abstract summary: Spoken Question Answering (SQA) is essential for machines to reply to user's question by finding the answer span within a given spoken passage.
This paper proposes the first known end-to-end framework, Speech Passage Retriever (SpeechDPR)
SpeechDPR learns a sentence-level semantic representation by distilling knowledge from the cascading model of unsupervised ASR (UASR) and dense text retriever (TDR)
- Score: 76.4510005602893
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spoken Question Answering (SQA) is essential for machines to reply to user's question by finding the answer span within a given spoken passage. SQA has been previously achieved without ASR to avoid recognition errors and Out-of-Vocabulary (OOV) problems. However, the real-world problem of Open-domain SQA (openSQA), in which the machine needs to first retrieve passages that possibly contain the answer from a spoken archive in addition, was never considered. This paper proposes the first known end-to-end framework, Speech Dense Passage Retriever (SpeechDPR), for the retrieval component of the openSQA problem. SpeechDPR learns a sentence-level semantic representation by distilling knowledge from the cascading model of unsupervised ASR (UASR) and text dense retriever (TDR). No manually transcribed speech data is needed. Initial experiments showed performance comparable to the cascading model of UASR and TDR, and significantly better when UASR was poor, verifying this approach is more robust to speech recognition errors.
Related papers
- A Multimodal Dense Retrieval Approach for Speech-Based Open-Domain Question Answering [16.613985687431818]
Passage retrieval is a key task in speech-based open-domain QA.
We propose an end-to-end trained multimodal dense retriever that can work directly on spoken questions.
arXiv Detail & Related papers (2024-09-20T13:15:53Z) - Phrase Retrieval for Open-Domain Conversational Question Answering with
Conversational Dependency Modeling via Contrastive Learning [54.55643652781891]
Open-Domain Conversational Question Answering (ODConvQA) aims at answering questions through a multi-turn conversation.
We propose a method to directly predict answers with a phrase retrieval scheme for a sequence of words.
arXiv Detail & Related papers (2023-06-07T09:46:38Z) - On the Impact of Speech Recognition Errors in Passage Retrieval for
Spoken Question Answering [13.013751306590303]
We study the robustness of lexical and dense retrievers against questions with synthetic ASR noise.
We create a new dataset with questions voiced by human users and use their transcriptions to show that the retrieval performance can further degrade when dealing with natural ASR noise instead of synthetic ASR noise.
arXiv Detail & Related papers (2022-09-26T18:29:36Z) - Multifaceted Improvements for Conversational Open-Domain Question
Answering [54.913313912927045]
We propose a framework with Multifaceted Improvements for Conversational open-domain Question Answering (MICQA)
Firstly, the proposed KL-divergence based regularization is able to lead to a better question understanding for retrieval and answer reading.
Second, the added post-ranker module can push more relevant passages to the top placements and be selected for reader with a two-aspect constrains.
Third, the well designed curriculum learning strategy effectively narrows the gap between the golden passage settings of training and inference, and encourages the reader to find true answer without the golden passage assistance.
arXiv Detail & Related papers (2022-04-01T07:54:27Z) - DUAL: Textless Spoken Question Answering with Speech Discrete Unit
Adaptive Learning [66.71308154398176]
Spoken Question Answering (SQA) has gained research attention and made remarkable progress in recent years.
Existing SQA methods rely on Automatic Speech Recognition (ASR) transcripts, which are time and cost-prohibitive to collect.
This work proposes an ASR transcript-free SQA framework named Discrete Unit Adaptive Learning (DUAL), which leverages unlabeled data for pre-training and is fine-tuned by the SQA downstream task.
arXiv Detail & Related papers (2022-03-09T17:46:22Z) - An Initial Investigation of Non-Native Spoken Question-Answering [36.89541375786233]
We show that a simple text-based ELECTRA MC model trained on SQuAD2.0 transfers well for spoken question answering tests.
One significant challenge is the lack of appropriately annotated speech corpora to train systems for this task.
Mismatches must be considered between text documents and spoken responses; non-native spoken grammar and written grammar.
arXiv Detail & Related papers (2021-07-09T21:59:16Z) - Contextualized Attention-based Knowledge Transfer for Spoken
Conversational Question Answering [63.72278693825945]
Spoken conversational question answering (SCQA) requires machines to model complex dialogue flow.
We propose CADNet, a novel contextualized attention-based distillation approach.
We conduct extensive experiments on the Spoken-CoQA dataset and demonstrate that our approach achieves remarkable performance.
arXiv Detail & Related papers (2020-10-21T15:17:18Z) - Towards Data Distillation for End-to-end Spoken Conversational Question
Answering [65.124088336738]
We propose a new Spoken Conversational Question Answering task (SCQA)
SCQA aims at enabling QA systems to model complex dialogues flow given the speech utterances and text corpora.
Our main objective is to build a QA system to deal with conversational questions both in spoken and text forms.
arXiv Detail & Related papers (2020-10-18T05:53:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.