A Multimodal Dense Retrieval Approach for Speech-Based Open-Domain Question Answering
- URL: http://arxiv.org/abs/2409.13483v1
- Date: Fri, 20 Sep 2024 13:15:53 GMT
- Title: A Multimodal Dense Retrieval Approach for Speech-Based Open-Domain Question Answering
- Authors: Georgios Sidiropoulos, Evangelos Kanoulas,
- Abstract summary: Passage retrieval is a key task in speech-based open-domain QA.
We propose an end-to-end trained multimodal dense retriever that can work directly on spoken questions.
- Score: 16.613985687431818
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech-based open-domain question answering (QA over a large corpus of text passages with spoken questions) has emerged as an important task due to the increasing number of users interacting with QA systems via speech interfaces. Passage retrieval is a key task in speech-based open-domain QA. So far, previous works adopted pipelines consisting of an automatic speech recognition (ASR) model that transcribes the spoken question before feeding it to a dense text retriever. Such pipelines have several limitations. The need for an ASR model limits the applicability to low-resource languages and specialized domains with no annotated speech data. Furthermore, the ASR model propagates its errors to the retriever. In this work, we try to alleviate these limitations by proposing an ASR-free, end-to-end trained multimodal dense retriever that can work directly on spoken questions. Our experimental results showed that, on shorter questions, our retriever is a promising alternative to the \textit{ASR and Retriever} pipeline, achieving better retrieval performance in cases where ASR would have mistranscribed important words in the question or have produced a transcription with a high word error rate.
Related papers
- Learning When to Retrieve, What to Rewrite, and How to Respond in Conversational QA [16.1357049130957]
We build on the single-turn SELF-RAG framework and propose SELF-multi-RAG for conversational settings.
SELF-multi-RAG demonstrates improved capabilities over single-turn variants with respect to retrieving relevant passages.
arXiv Detail & Related papers (2024-09-23T20:05:12Z) - SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering [76.4510005602893]
Spoken Question Answering (SQA) is essential for machines to reply to user's question by finding the answer span within a given spoken passage.
This paper proposes the first known end-to-end framework, Speech Passage Retriever (SpeechDPR)
SpeechDPR learns a sentence-level semantic representation by distilling knowledge from the cascading model of unsupervised ASR (UASR) and dense text retriever (TDR)
arXiv Detail & Related papers (2024-01-24T14:08:38Z) - Phrase Retrieval for Open-Domain Conversational Question Answering with
Conversational Dependency Modeling via Contrastive Learning [54.55643652781891]
Open-Domain Conversational Question Answering (ODConvQA) aims at answering questions through a multi-turn conversation.
We propose a method to directly predict answers with a phrase retrieval scheme for a sequence of words.
arXiv Detail & Related papers (2023-06-07T09:46:38Z) - On the Impact of Speech Recognition Errors in Passage Retrieval for
Spoken Question Answering [13.013751306590303]
We study the robustness of lexical and dense retrievers against questions with synthetic ASR noise.
We create a new dataset with questions voiced by human users and use their transcriptions to show that the retrieval performance can further degrade when dealing with natural ASR noise instead of synthetic ASR noise.
arXiv Detail & Related papers (2022-09-26T18:29:36Z) - Multifaceted Improvements for Conversational Open-Domain Question
Answering [54.913313912927045]
We propose a framework with Multifaceted Improvements for Conversational open-domain Question Answering (MICQA)
Firstly, the proposed KL-divergence based regularization is able to lead to a better question understanding for retrieval and answer reading.
Second, the added post-ranker module can push more relevant passages to the top placements and be selected for reader with a two-aspect constrains.
Third, the well designed curriculum learning strategy effectively narrows the gap between the golden passage settings of training and inference, and encourages the reader to find true answer without the golden passage assistance.
arXiv Detail & Related papers (2022-04-01T07:54:27Z) - DUAL: Textless Spoken Question Answering with Speech Discrete Unit
Adaptive Learning [66.71308154398176]
Spoken Question Answering (SQA) has gained research attention and made remarkable progress in recent years.
Existing SQA methods rely on Automatic Speech Recognition (ASR) transcripts, which are time and cost-prohibitive to collect.
This work proposes an ASR transcript-free SQA framework named Discrete Unit Adaptive Learning (DUAL), which leverages unlabeled data for pre-training and is fine-tuned by the SQA downstream task.
arXiv Detail & Related papers (2022-03-09T17:46:22Z) - CONQRR: Conversational Query Rewriting for Retrieval with Reinforcement
Learning [16.470428531658232]
We develop a query rewriting model CONQRR that rewrites a conversational question in context into a standalone question.
We show that CONQRR achieves state-of-the-art results on a recent open-domain CQA dataset.
arXiv Detail & Related papers (2021-12-16T01:40:30Z) - Towards Data Distillation for End-to-end Spoken Conversational Question
Answering [65.124088336738]
We propose a new Spoken Conversational Question Answering task (SCQA)
SCQA aims at enabling QA systems to model complex dialogues flow given the speech utterances and text corpora.
Our main objective is to build a QA system to deal with conversational questions both in spoken and text forms.
arXiv Detail & Related papers (2020-10-18T05:53:39Z) - Answering Any-hop Open-domain Questions with Iterative Document
Reranking [62.76025579681472]
We propose a unified QA framework to answer any-hop open-domain questions.
Our method consistently achieves performance comparable to or better than the state-of-the-art on both single-hop and multi-hop open-domain QA datasets.
arXiv Detail & Related papers (2020-09-16T04:31:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.