Mitigating False-Negative Contexts in Multi-document QuestionAnswering
with Retrieval Marginalization
- URL: http://arxiv.org/abs/2103.12235v1
- Date: Mon, 22 Mar 2021 23:44:35 GMT
- Title: Mitigating False-Negative Contexts in Multi-document QuestionAnswering
with Retrieval Marginalization
- Authors: Ansong Ni, Matt Gardner, Pradeep Dasigi
- Abstract summary: We develop a new parameterization of set-valued retrieval that properly handles unanswerable queries.
We show that marginalizing over this set during training allows a model to mitigate false negatives in annotated supporting evidences.
On IIRC, we show that joint modeling with marginalization on alternative contexts improves model performance by 5.5 F1 points and achieves a new state-of-the-art performance of 50.6 F1.
- Score: 29.797379277423143
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Question Answering (QA) tasks requiring information from multiple documents
often rely on a retrieval model to identify relevant information from which the
reasoning model can derive an answer. The retrieval model is typically trained
to maximize the likelihood of the labeled supporting evidence. However, when
retrieving from large text corpora such as Wikipedia, the correct answer can
often be obtained from multiple evidence candidates, not all of them labeled as
positive, thus rendering the training signal weak and noisy. The problem is
exacerbated when the questions are unanswerable or the answers are boolean,
since the models cannot rely on lexical overlap to map answers to supporting
evidences. We develop a new parameterization of set-valued retrieval that
properly handles unanswerable queries, and we show that marginalizing over this
set during training allows a model to mitigate false negatives in annotated
supporting evidences. We test our method with two multi-document QA datasets,
IIRC and HotpotQA. On IIRC, we show that joint modeling with marginalization on
alternative contexts improves model performance by 5.5 F1 points and achieves a
new state-of-the-art performance of 50.6 F1. We also show that marginalization
results in 0.9 to 1.6 QA F1 improvement on HotpotQA in various settings.
Related papers
- Question: How do Large Language Models perform on the Question Answering tasks? Answer: [0.0]
Large Language Models (LLMs) have been showing promising results for various NLP-tasks without the explicit need to be trained for these tasks by using few-shot or zero-shot prompting techniques.
We propose a comprehensive performance comparison between smaller fine-tuned models and out-of-the-box instruction-following LLMs on the Stanford Question Answering dataset 2.0 (SQuAD2)
Our results show that smaller, fine-tuned models outperform current State-Of-The-Art (SOTA) LLMs on the fine-tuned task, but recent SOTA models are able to close this gap on the out
arXiv Detail & Related papers (2024-12-17T13:19:38Z) - NewsQs: Multi-Source Question Generation for the Inquiring Mind [59.79288644158271]
We present NewsQs, a dataset that provides question-answer pairs for multiple news documents.
To create NewsQs, we augment a traditional multi-document summarization dataset with questions automatically generated by a T5-Large model fine-tuned on FAQ-style news articles.
arXiv Detail & Related papers (2024-02-28T16:59:35Z) - A Lightweight Method to Generate Unanswerable Questions in English [18.323248259867356]
We examine a simpler data augmentation method for unanswerable question generation in English.
We perform antonym and entity swaps on answerable questions.
Compared to the prior state-of-the-art, data generated with our training-free and lightweight strategy results in better models.
arXiv Detail & Related papers (2023-10-30T10:14:52Z) - Answering Ambiguous Questions via Iterative Prompting [84.3426020642704]
In open-domain question answering, due to the ambiguity of questions, multiple plausible answers may exist.
One approach is to directly predict all valid answers, but this can struggle with balancing relevance and diversity.
We present AmbigPrompt to address the imperfections of existing approaches to answering ambiguous questions.
arXiv Detail & Related papers (2023-07-08T04:32:17Z) - Counterfactual Variable Control for Robust and Interpretable Question
Answering [57.25261576239862]
Deep neural network based question answering (QA) models are neither robust nor explainable in many cases.
In this paper, we inspect such spurious "capability" of QA models using causal inference.
We propose a novel approach called Counterfactual Variable Control (CVC) that explicitly mitigates any shortcut correlation.
arXiv Detail & Related papers (2020-10-12T10:09:05Z) - FEQA: A Question Answering Evaluation Framework for Faithfulness
Assessment in Abstractive Summarization [34.2456005415483]
We tackle the problem of evaluating faithfulness of a generated summary given its source document.
We find that current models exhibit a trade-off between abstractiveness and faithfulness.
We propose an automatic question answering (QA) based metric for faithfulness.
arXiv Detail & Related papers (2020-05-07T21:00:08Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z) - Probabilistic Assumptions Matter: Improved Models for
Distantly-Supervised Document-Level Question Answering [35.55031325165487]
We address the problem of extractive question answering using document-level distant super-vision.
We show that these assumptions interact, and that different configurations provide complementary benefits.
Our approach outperforms previous state-of-the-art models by 4.3 points in F1 on TriviaQA-Wiki and 1.7 points in Rouge-L on NarrativeQA summaries.
arXiv Detail & Related papers (2020-05-05T01:08:36Z) - Robust Question Answering Through Sub-part Alignment [53.94003466761305]
We model question answering as an alignment problem.
We train our model on SQuAD v1.1 and test it on several adversarial and out-of-domain datasets.
arXiv Detail & Related papers (2020-04-30T09:10:57Z) - ManyModalQA: Modality Disambiguation and QA over Diverse Inputs [73.93607719921945]
We present a new multimodal question answering challenge, ManyModalQA, in which an agent must answer a question by considering three distinct modalities.
We collect our data by scraping Wikipedia and then utilize crowdsourcing to collect question-answer pairs.
arXiv Detail & Related papers (2020-01-22T14:39:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.