NoiseQA: Challenge Set Evaluation for User-Centric Question Answering
- URL: http://arxiv.org/abs/2102.08345v1
- Date: Tue, 16 Feb 2021 18:35:29 GMT
- Title: NoiseQA: Challenge Set Evaluation for User-Centric Question Answering
- Authors: Abhilasha Ravichander, Siddharth Dalmia, Maria Ryskina, Florian Metze,
Eduard Hovy, Alan W Black
- Abstract summary: We show that components in the pipeline that precede an answering engine can introduce varied and considerable sources of error.
We conclude that there is substantial room for progress before QA systems can be effectively deployed.
- Score: 68.67783808426292
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When Question-Answering (QA) systems are deployed in the real world, users
query them through a variety of interfaces, such as speaking to voice
assistants, typing questions into a search engine, or even translating
questions to languages supported by the QA system. While there has been
significant community attention devoted to identifying correct answers in
passages assuming a perfectly formed question, we show that components in the
pipeline that precede an answering engine can introduce varied and considerable
sources of error, and performance can degrade substantially based on these
upstream noise sources even for powerful pre-trained QA models. We conclude
that there is substantial room for progress before QA systems can be
effectively deployed, highlight the need for QA evaluation to expand to
consider real-world use, and hope that our findings will spur greater community
interest in the issues that arise when our systems actually need to be of
utility to humans.
Related papers
- QADYNAMICS: Training Dynamics-Driven Synthetic QA Diagnostic for
Zero-Shot Commonsense Question Answering [48.25449258017601]
State-of-the-art approaches fine-tune language models on QA pairs constructed from CommonSense Knowledge Bases.
We propose QADYNAMICS, a training dynamics-driven framework for QA diagnostics and refinement.
arXiv Detail & Related papers (2023-10-17T14:27:34Z) - Answering Unanswered Questions through Semantic Reformulations in Spoken
QA [20.216161323866867]
Spoken Question Answering (QA) is a key feature of voice assistants, usually backed by multiple QA systems.
We analyze failed QA requests to identify core challenges: lexical gaps, proposition types, complex syntactic structure, and high specificity.
We propose a Semantic Question Reformulation (SURF) model offering three linguistically-grounded operations (repair, syntactic reshaping, generalization) to rewrite questions to facilitate answering.
arXiv Detail & Related papers (2023-05-27T07:19:27Z) - HeySQuAD: A Spoken Question Answering Dataset [2.3881849082514153]
This study presents a new large-scale community-shared SQA dataset called HeySQuAD.
Our goal is to measure the ability of machines to accurately understand noisy spoken questions and provide reliable answers.
arXiv Detail & Related papers (2023-04-26T17:15:39Z) - On the Impact of Speech Recognition Errors in Passage Retrieval for
Spoken Question Answering [13.013751306590303]
We study the robustness of lexical and dense retrievers against questions with synthetic ASR noise.
We create a new dataset with questions voiced by human users and use their transcriptions to show that the retrieval performance can further degrade when dealing with natural ASR noise instead of synthetic ASR noise.
arXiv Detail & Related papers (2022-09-26T18:29:36Z) - Evaluation of Question Answering Systems: Complexity of judging a
natural language [3.4771957347698583]
Question answering (QA) systems are among the most important and rapidly developing research topics in natural language processing (NLP)
This survey attempts to provide a systematic overview of the general framework of QA, QA paradigms, benchmark datasets, and assessment techniques for a quantitative evaluation of QA systems.
arXiv Detail & Related papers (2022-09-10T12:29:04Z) - ProQA: Structural Prompt-based Pre-training for Unified Question
Answering [84.59636806421204]
ProQA is a unified QA paradigm that solves various tasks through a single model.
It concurrently models the knowledge generalization for all QA tasks while keeping the knowledge customization for every specific QA task.
ProQA consistently boosts performance on both full data fine-tuning, few-shot learning, and zero-shot testing scenarios.
arXiv Detail & Related papers (2022-05-09T04:59:26Z) - Better Retrieval May Not Lead to Better Question Answering [59.1892787017522]
A popular approach to improve the system's performance is to improve the quality of the retrieved context from the IR stage.
We show that for StrategyQA, a challenging open-domain QA dataset that requires multi-hop reasoning, this common approach is surprisingly ineffective.
arXiv Detail & Related papers (2022-05-07T16:59:38Z) - Improving the Question Answering Quality using Answer Candidate
Filtering based on Natural-Language Features [117.44028458220427]
We address the problem of how the Question Answering (QA) quality of a given system can be improved.
Our main contribution is an approach capable of identifying wrong answers provided by a QA system.
In particular, our approach has shown its potential while removing in many cases the majority of incorrect answers.
arXiv Detail & Related papers (2021-12-10T11:09:44Z) - Unsupervised Question Decomposition for Question Answering [102.56966847404287]
We propose an algorithm for One-to-N Unsupervised Sequence Sequence (ONUS) that learns to map one hard, multi-hop question to many simpler, single-hop sub-questions.
We show large QA improvements on HotpotQA over a strong baseline on the original, out-of-domain, and multi-hop dev sets.
arXiv Detail & Related papers (2020-02-22T19:40:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.