Related papers: On the Impact of Speech Recognition Errors in Passage Retrieval for Spoken Question Answering

On the Impact of Speech Recognition Errors in Passage Retrieval for Spoken Question Answering

URL: http://arxiv.org/abs/2209.12944v1
Date: Mon, 26 Sep 2022 18:29:36 GMT
Title: On the Impact of Speech Recognition Errors in Passage Retrieval for Spoken Question Answering
Authors: Georgios Sidiropoulos, Svitlana Vakulenko, and Evangelos Kanoulas
Abstract summary: We study the robustness of lexical and dense retrievers against questions with synthetic ASR noise. We create a new dataset with questions voiced by human users and use their transcriptions to show that the retrieval performance can further degrade when dealing with natural ASR noise instead of synthetic ASR noise.
Score: 13.013751306590303
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Interacting with a speech interface to query a Question Answering (QA) system is becoming increasingly popular. Typically, QA systems rely on passage retrieval to select candidate contexts and reading comprehension to extract the final answer. While there has been some attention to improving the reading comprehension part of QA systems against errors that automatic speech recognition (ASR) models introduce, the passage retrieval part remains unexplored. However, such errors can affect the performance of passage retrieval, leading to inferior end-to-end performance. To address this gap, we augment two existing large-scale passage ranking and open domain QA datasets with synthetic ASR noise and study the robustness of lexical and dense retrievers against questions with ASR noise. Furthermore, we study the generalizability of data augmentation techniques across different domains; with each domain being a different language dialect or accent. Finally, we create a new dataset with questions voiced by human users and use their transcriptions to show that the retrieval performance can further degrade when dealing with natural ASR noise instead of synthetic ASR noise.

Related papers

NoisyEQA: Benchmarking Embodied Question Answering Against Noisy Queries [16.283468528293568]
We introduce a NoisyEQA benchmark designed to evaluate an agent's ability to recognize and correct noisy questions. This benchmark introduces four common types of noise found in real-world applications: Latent Hallucination Noise, Memory Noise, Perception Noise, and Semantic Noise. We also propose a 'Self-Correction' prompting mechanism and a new evaluation metric to enhance and measure both noise detection capability and answer quality.
arXiv Detail & Related papers (2024-12-14T07:52:24Z)
Learning When to Retrieve, What to Rewrite, and How to Respond in Conversational QA [16.1357049130957]
We build on the single-turn SELF-RAG framework and propose SELF-multi-RAG for conversational settings. SELF-multi-RAG demonstrates improved capabilities over single-turn variants with respect to retrieving relevant passages.
arXiv Detail & Related papers (2024-09-23T20:05:12Z)
A Multimodal Dense Retrieval Approach for Speech-Based Open-Domain Question Answering [16.613985687431818]
Passage retrieval is a key task in speech-based open-domain QA. We propose an end-to-end trained multimodal dense retriever that can work directly on spoken questions.
arXiv Detail & Related papers (2024-09-20T13:15:53Z)
SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering [76.4510005602893]
Spoken Question Answering (SQA) is essential for machines to reply to user's question by finding the answer span within a given spoken passage. This paper proposes the first known end-to-end framework, Speech Passage Retriever (SpeechDPR) SpeechDPR learns a sentence-level semantic representation by distilling knowledge from the cascading model of unsupervised ASR (UASR) and dense text retriever (TDR)
arXiv Detail & Related papers (2024-01-24T14:08:38Z)
On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning Era [0.0]
We create transcripts from the original speech by applying three modern ASR systems. For extraction and learning of acoustic speech features, we utilise openSMILE, openXBoW, DeepSpectrum, and auDeep. We achieve state-of-the-art unweighted average recall values of $73.6,%$ and $73.8,%$ on the speaker-independent development and test partitions of IEMOCAP.
arXiv Detail & Related papers (2021-04-20T17:10:01Z)
Contextualized Attention-based Knowledge Transfer for Spoken Conversational Question Answering [63.72278693825945]
Spoken conversational question answering (SCQA) requires machines to model complex dialogue flow. We propose CADNet, a novel contextualized attention-based distillation approach. We conduct extensive experiments on the Spoken-CoQA dataset and demonstrate that our approach achieves remarkable performance.
arXiv Detail & Related papers (2020-10-21T15:17:18Z)
Towards Data Distillation for End-to-end Spoken Conversational Question Answering [65.124088336738]
We propose a new Spoken Conversational Question Answering task (SCQA) SCQA aims at enabling QA systems to model complex dialogues flow given the speech utterances and text corpora. Our main objective is to build a QA system to deal with conversational questions both in spoken and text forms.
arXiv Detail & Related papers (2020-10-18T05:53:39Z)
Open-Retrieval Conversational Question Answering [62.11228261293487]
We introduce an open-retrieval conversational question answering (ORConvQA) setting, where we learn to retrieve evidence from a large collection before extracting answers. We build an end-to-end system for ORConvQA, featuring a retriever, a reranker, and a reader that are all based on Transformers.
arXiv Detail & Related papers (2020-05-22T19:39:50Z)
Multi-Stage Conversational Passage Retrieval: An Approach to Fusing Term Importance Estimation and Neural Query Rewriting [56.268862325167575]
We tackle conversational passage retrieval (ConvPR) with query reformulation integrated into a multi-stage ad-hoc IR system. We propose two conversational query reformulation (CQR) methods: (1) term importance estimation and (2) neural query rewriting. For the former, we expand conversational queries using important terms extracted from the conversational context with frequency-based signals. For the latter, we reformulate conversational queries into natural, standalone, human-understandable queries with a pretrained sequence-tosequence model.
arXiv Detail & Related papers (2020-05-05T14:30:20Z)
Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR) APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker. We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.