GERE: Generative Evidence Retrieval for Fact Verification
- URL: http://arxiv.org/abs/2204.05511v2
- Date: Thu, 14 Apr 2022 08:03:34 GMT
- Title: GERE: Generative Evidence Retrieval for Fact Verification
- Authors: Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Yixing Fan, and Xueqi Cheng
- Abstract summary: We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
- Score: 57.78768817972026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fact verification (FV) is a challenging task which aims to verify a claim
using multiple evidential sentences from trustworthy corpora, e.g., Wikipedia.
Most existing approaches follow a three-step pipeline framework, including
document retrieval, sentence retrieval and claim verification. High-quality
evidences provided by the first two steps are the foundation of the effective
reasoning in the last step. Despite being important, high-quality evidences are
rarely studied by existing works for FV, which often adopt the off-the-shelf
models to retrieve relevant documents and sentences in an
"index-retrieve-then-rank" fashion. This classical approach has clear drawbacks
as follows: i) a large document index as well as a complicated search process
is required, leading to considerable memory and computational overhead; ii)
independent scoring paradigms fail to capture the interactions among documents
and sentences in ranking; iii) a fixed number of sentences are selected to form
the final evidence set. In this work, we propose GERE, the first system that
retrieves evidences in a generative fashion, i.e., generating the document
titles as well as evidence sentence identifiers. This enables us to mitigate
the aforementioned technical issues since: i) the memory and computational cost
is greatly reduced because the document index is eliminated and the heavy
ranking process is replaced by a light generative process; ii) the dependency
between documents and that between sentences could be captured via sequential
generation process; iii) the generative formulation allows us to dynamically
select a precise set of relevant evidences for each claim. The experimental
results on the FEVER dataset show that GERE achieves significant improvements
over the state-of-the-art baselines, with both time-efficiency and
memory-efficiency.
Related papers
- JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking [81.88787401178378]
We introduce JudgeRank, a novel agentic reranker that emulates human cognitive processes when assessing document relevance.
We evaluate JudgeRank on the reasoning-intensive BRIGHT benchmark, demonstrating substantial performance improvements over first-stage retrieval methods.
In addition, JudgeRank performs on par with fine-tuned state-of-the-art rerankers on the popular BEIR benchmark, validating its zero-shot generalization capability.
arXiv Detail & Related papers (2024-10-31T18:43:12Z) - Give Me More Details: Improving Fact-Checking with Latent Retrieval [58.706972228039604]
Evidence plays a crucial role in automated fact-checking.
Existing fact-checking systems either assume the evidence sentences are given or use the search snippets returned by the search engine.
We propose to incorporate full text from source documents as evidence and introduce two enriched datasets.
arXiv Detail & Related papers (2023-05-25T15:01:19Z) - Fine-Grained Distillation for Long Document Retrieval [86.39802110609062]
Long document retrieval aims to fetch query-relevant documents from a large-scale collection.
Knowledge distillation has become de facto to improve a retriever by mimicking a heterogeneous yet powerful cross-encoder.
We propose a new learning framework, fine-grained distillation (FGD), for long-document retrievers.
arXiv Detail & Related papers (2022-12-20T17:00:36Z) - Natural Logic-guided Autoregressive Multi-hop Document Retrieval for
Fact Verification [21.04611844009438]
We propose a novel retrieve-and-rerank method for multi-hop retrieval.
It consists of a retriever that jointly scores documents in the knowledge source and sentences from previously retrieved documents.
It is guided by a proof system that dynamically terminates the retrieval process if the evidence is deemed sufficient.
arXiv Detail & Related papers (2022-12-10T11:32:38Z) - Document-Level Relation Extraction with Sentences Importance Estimation
and Focusing [52.069206266557266]
Document-level relation extraction (DocRE) aims to determine the relation between two entities from a document of multiple sentences.
We propose a Sentence Estimation and Focusing (SIEF) framework for DocRE, where we design a sentence importance score and a sentence focusing loss.
Experimental results on two domains show that our SIEF not only improves overall performance, but also makes DocRE models more robust.
arXiv Detail & Related papers (2022-04-27T03:20:07Z) - Augmenting Document Representations for Dense Retrieval with
Interpolation and Perturbation [49.940525611640346]
Document Augmentation for dense Retrieval (DAR) framework augments the representations of documents with their Dense Augmentation and perturbations.
We validate the performance of DAR on retrieval tasks with two benchmark datasets, showing that the proposed DAR significantly outperforms relevant baselines on the dense retrieval of both the labeled and unlabeled documents.
arXiv Detail & Related papers (2022-03-15T09:07:38Z) - CODER: An efficient framework for improving retrieval through
COntextualized Document Embedding Reranking [11.635294568328625]
We present a framework for improving the performance of a wide class of retrieval models at minimal computational cost.
It utilizes precomputed document representations extracted by a base dense retrieval method.
It incurs a negligible computational overhead on top of any first-stage method at run time, allowing it to be easily combined with any state-of-the-art dense retrieval method.
arXiv Detail & Related papers (2021-12-16T10:25:26Z) - Improving Query Representations for Dense Retrieval with Pseudo
Relevance Feedback [29.719150565643965]
This paper proposes ANCE-PRF, a new query encoder that uses pseudo relevance feedback (PRF) to improve query representations for dense retrieval.
ANCE-PRF uses a BERT encoder that consumes the query and the top retrieved documents from a dense retrieval model, ANCE, and it learns to produce better query embeddings directly from relevance labels.
Analysis shows that the PRF encoder effectively captures the relevant and complementary information from PRF documents, while ignoring the noise with its learned attention mechanism.
arXiv Detail & Related papers (2021-08-30T18:10:26Z) - Hierarchical Evidence Set Modeling for Automated Fact Extraction and
Verification [5.836068916903788]
Hierarchical Evidence Set Modeling (HESM) is a framework to extract evidence sets and verify a claim to be supported, refuted or not enough info.
Our experimental results show that HESM outperforms 7 state-of-the-art methods for fact extraction and claim verification.
arXiv Detail & Related papers (2020-10-10T22:27:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.