Distantly-Supervised Evidence Retrieval Enables Question Answering
without Evidence Annotation
- URL: http://arxiv.org/abs/2110.04889v1
- Date: Sun, 10 Oct 2021 20:01:27 GMT
- Title: Distantly-Supervised Evidence Retrieval Enables Question Answering
without Evidence Annotation
- Authors: Chen Zhao, Chenyan Xiong, Jordan Boyd-Graber, Hal Daum\'e III
- Abstract summary: Open-domain question answering answers a question based on evidence retrieved from a large corpus.
This paper investigates whether models can learn to find evidence from a large corpus, with only distant supervision from answer labels for model training.
We introduce a novel approach (DistDR) that iteratively improves over a weak retriever by alternately finding evidence from the up-to-date model and encouraging the model to learn the most likely evidence.
- Score: 19.551623461082617
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Open-domain question answering answers a question based on evidence retrieved
from a large corpus. State-of-the-art neural approaches require intermediate
evidence annotations for training. However, such intermediate annotations are
expensive, and methods that rely on them cannot transfer to the more common
setting, where only question-answer pairs are available. This paper
investigates whether models can learn to find evidence from a large corpus,
with only distant supervision from answer labels for model training, thereby
generating no additional annotation cost. We introduce a novel approach
(DistDR) that iteratively improves over a weak retriever by alternately finding
evidence from the up-to-date model and encouraging the model to learn the most
likely evidence. Without using any evidence labels, DistDR is on par with
fully-supervised state-of-the-art methods on both multi-hop and single-hop QA
benchmarks. Our analysis confirms that DistDR finds more accurate evidence over
iterations, which leads to model improvements.
Related papers
- Contrastive Learning to Improve Retrieval for Real-world Fact Checking [84.57583869042791]
We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for fact-checking complex claims.
We leverage the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents.
We find a 6% improvement in veracity classification accuracy on the dataset.
arXiv Detail & Related papers (2024-10-07T00:09:50Z) - Progressive Evidence Refinement for Open-domain Multimodal Retrieval
Question Answering [20.59485758381809]
Current multimodal retrieval question-answering models face two main challenges.
utilizing compressed evidence features as input to the model results in the loss of fine-grained information within the evidence.
We propose a two-stage framework for evidence retrieval and question-answering to alleviate these issues.
arXiv Detail & Related papers (2023-10-15T01:18:39Z) - HOP, UNION, GENERATE: Explainable Multi-hop Reasoning without Rationale
Supervision [118.0818807474809]
This work proposes a principled, probabilistic approach for training explainable multi-hop QA systems without rationale supervision.
Our approach performs multi-hop reasoning by explicitly modeling rationales as sets, enabling the model to capture interactions between documents and sentences within a document.
arXiv Detail & Related papers (2023-05-23T16:53:49Z) - Open-domain Question Answering via Chain of Reasoning over Heterogeneous
Knowledge [82.5582220249183]
We propose a novel open-domain question answering (ODQA) framework for answering single/multi-hop questions across heterogeneous knowledge sources.
Unlike previous methods that solely rely on the retriever for gathering all evidence in isolation, our intermediary performs a chain of reasoning over the retrieved set.
Our system achieves competitive performance on two ODQA datasets, OTT-QA and NQ, against tables and passages from Wikipedia.
arXiv Detail & Related papers (2022-10-22T03:21:32Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - Making Document-Level Information Extraction Right for the Right Reasons [19.00249049142611]
Document-level information extraction is a flexible framework compatible with applications where information is not necessarily localized in a single sentence.
This work studies how to ensure that document-level neural models make correct inferences from complex text and make those inferences in an auditable way.
arXiv Detail & Related papers (2021-10-14T19:52:47Z) - Robustifying Multi-hop QA through Pseudo-Evidentiality Training [28.584236042324896]
We study the bias problem of multi-hop question answering models, of answering correctly without correct reasoning.
We propose a new approach to learn evidentiality, deciding whether the answer prediction is supported by correct evidences.
arXiv Detail & Related papers (2021-07-07T14:15:14Z) - Weakly- and Semi-supervised Evidence Extraction [107.47661281843232]
We propose new methods to combine few evidence annotations with abundant document-level labels for the task of evidence extraction.
Our approach yields substantial gains with as few as hundred evidence annotations.
arXiv Detail & Related papers (2020-11-03T04:05:00Z) - Leveraging Passage Retrieval with Generative Models for Open Domain
Question Answering [61.394478670089065]
Generative models for open domain question answering have proven to be competitive, without resorting to external knowledge.
We investigate how much these models can benefit from retrieving text passages, potentially containing evidence.
We observe that the performance of this method significantly improves when increasing the number of retrieved passages.
arXiv Detail & Related papers (2020-07-02T17:44:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.