Related papers: Mitigating False-Negative Contexts in Multi-document QuestionAnswering with Retrieval Marginalization

Mitigating False-Negative Contexts in Multi-document QuestionAnswering with Retrieval Marginalization

URL: http://arxiv.org/abs/2103.12235v1
Date: Mon, 22 Mar 2021 23:44:35 GMT
Title: Mitigating False-Negative Contexts in Multi-document QuestionAnswering with Retrieval Marginalization
Authors: Ansong Ni, Matt Gardner, Pradeep Dasigi
Abstract summary: We develop a new parameterization of set-valued retrieval that properly handles unanswerable queries. We show that marginalizing over this set during training allows a model to mitigate false negatives in annotated supporting evidences. On IIRC, we show that joint modeling with marginalization on alternative contexts improves model performance by 5.5 F1 points and achieves a new state-of-the-art performance of 50.6 F1.
Score: 29.797379277423143
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Question Answering (QA) tasks requiring information from multiple documents often rely on a retrieval model to identify relevant information from which the reasoning model can derive an answer. The retrieval model is typically trained to maximize the likelihood of the labeled supporting evidence. However, when retrieving from large text corpora such as Wikipedia, the correct answer can often be obtained from multiple evidence candidates, not all of them labeled as positive, thus rendering the training signal weak and noisy. The problem is exacerbated when the questions are unanswerable or the answers are boolean, since the models cannot rely on lexical overlap to map answers to supporting evidences. We develop a new parameterization of set-valued retrieval that properly handles unanswerable queries, and we show that marginalizing over this set during training allows a model to mitigate false negatives in annotated supporting evidences. We test our method with two multi-document QA datasets, IIRC and HotpotQA. On IIRC, we show that joint modeling with marginalization on alternative contexts improves model performance by 5.5 F1 points and achieves a new state-of-the-art performance of 50.6 F1. We also show that marginalization results in 0.9 to 1.6 QA F1 improvement on HotpotQA in various settings.

Related papers

Question: How do Large Language Models perform on the Question Answering tasks? Answer: [0.0]
Large Language Models (LLMs) have been showing promising results for various NLP-tasks without the explicit need to be trained for these tasks by using few-shot or zero-shot prompting techniques. We propose a comprehensive performance comparison between smaller fine-tuned models and out-of-the-box instruction-following LLMs on the Stanford Question Answering dataset 2.0 (SQuAD2) Our results show that smaller, fine-tuned models outperform current State-Of-The-Art (SOTA) LLMs on the fine-tuned task, but recent SOTA models are able to close this gap on the out
arXiv Detail & Related papers (2024-12-17T13:19:38Z)
Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions [103.20281438405111]
Multiple-choice question answering (MCQA) is a key competence of performant transformer language models. We employ vocabulary projection and activation patching methods to localize key hidden states that encode relevant information for predicting the correct answer. We show that subsequent layers increase the probability of the predicted answer symbol in vocabulary space, and that this probability increase is associated with a sparse set of attention heads with unique roles.
arXiv Detail & Related papers (2024-07-21T00:10:23Z)
NewsQs: Multi-Source Question Generation for the Inquiring Mind [59.79288644158271]
We present NewsQs, a dataset that provides question-answer pairs for multiple news documents. To create NewsQs, we augment a traditional multi-document summarization dataset with questions automatically generated by a T5-Large model fine-tuned on FAQ-style news articles.
arXiv Detail & Related papers (2024-02-28T16:59:35Z)
A Lightweight Method to Generate Unanswerable Questions in English [18.323248259867356]
We examine a simpler data augmentation method for unanswerable question generation in English. We perform antonym and entity swaps on answerable questions. Compared to the prior state-of-the-art, data generated with our training-free and lightweight strategy results in better models.
arXiv Detail & Related papers (2023-10-30T10:14:52Z)
Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering [26.34649731975005]
Retriever-augmented instruction-following models are attractive alternatives to fine-tuned approaches for question answering (QA) While the model responses tend to be natural and fluent, the additional verbosity makes traditional QA evaluation metrics unreliable for accurately quantifying model performance. We use both automatic and human evaluation to evaluate these models along two dimensions: 1) how well they satisfy the user's information need (correctness) and 2) whether they produce a response based on the provided knowledge (faithfulness)
arXiv Detail & Related papers (2023-07-31T17:41:00Z)
Answering Ambiguous Questions via Iterative Prompting [84.3426020642704]
In open-domain question answering, due to the ambiguity of questions, multiple plausible answers may exist. One approach is to directly predict all valid answers, but this can struggle with balancing relevance and diversity. We present AmbigPrompt to address the imperfections of existing approaches to answering ambiguous questions.
arXiv Detail & Related papers (2023-07-08T04:32:17Z)
Counterfactual Variable Control for Robust and Interpretable Question Answering [57.25261576239862]
Deep neural network based question answering (QA) models are neither robust nor explainable in many cases. In this paper, we inspect such spurious "capability" of QA models using causal inference. We propose a novel approach called Counterfactual Variable Control (CVC) that explicitly mitigates any shortcut correlation.
arXiv Detail & Related papers (2020-10-12T10:09:05Z)
FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization [34.2456005415483]
We tackle the problem of evaluating faithfulness of a generated summary given its source document. We find that current models exhibit a trade-off between abstractiveness and faithfulness. We propose an automatic question answering (QA) based metric for faithfulness.
arXiv Detail & Related papers (2020-05-07T21:00:08Z)
Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA) First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA) Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z)
Probabilistic Assumptions Matter: Improved Models for Distantly-Supervised Document-Level Question Answering [35.55031325165487]
We address the problem of extractive question answering using document-level distant super-vision. We show that these assumptions interact, and that different configurations provide complementary benefits. Our approach outperforms previous state-of-the-art models by 4.3 points in F1 on TriviaQA-Wiki and 1.7 points in Rouge-L on NarrativeQA summaries.
arXiv Detail & Related papers (2020-05-05T01:08:36Z)
Robust Question Answering Through Sub-part Alignment [53.94003466761305]
We model question answering as an alignment problem. We train our model on SQuAD v1.1 and test it on several adversarial and out-of-domain datasets.
arXiv Detail & Related papers (2020-04-30T09:10:57Z)
ManyModalQA: Modality Disambiguation and QA over Diverse Inputs [73.93607719921945]
We present a new multimodal question answering challenge, ManyModalQA, in which an agent must answer a question by considering three distinct modalities. We collect our data by scraping Wikipedia and then utilize crowdsourcing to collect question-answer pairs.
arXiv Detail & Related papers (2020-01-22T14:39:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.