FactIR: A Real-World Zero-shot Open-Domain Retrieval Benchmark for Fact-Checking
- URL: http://arxiv.org/abs/2502.06006v1
- Date: Sun, 09 Feb 2025 19:51:00 GMT
- Title: FactIR: A Real-World Zero-shot Open-Domain Retrieval Benchmark for Fact-Checking
- Authors: Venktesh V, Vinay Setty,
- Abstract summary: The field of automated fact-checking increasingly depends on retrieving web-based evidence to determine the veracity of claims in real-world scenarios.
Traditional retrieval methods may return documents that directly address claims or lean toward supporting them, but often struggle with more complex claims requiring indirect reasoning.
We present a real-world benchmark FactIR, derived from Factiverse production logs, enhanced with human annotations.
- Score: 3.1537425078180625
- License:
- Abstract: The field of automated fact-checking increasingly depends on retrieving web-based evidence to determine the veracity of claims in real-world scenarios. A significant challenge in this process is not only retrieving relevant information, but also identifying evidence that can both support and refute complex claims. Traditional retrieval methods may return documents that directly address claims or lean toward supporting them, but often struggle with more complex claims requiring indirect reasoning. While some existing benchmarks and methods target retrieval for fact-checking, a comprehensive real-world open-domain benchmark has been lacking. In this paper, we present a real-world retrieval benchmark FactIR, derived from Factiverse production logs, enhanced with human annotations. We rigorously evaluate state-of-the-art retrieval models in a zero-shot setup on FactIR and offer insights for developing practical retrieval systems for fact-checking. Code and data are available at https://github.com/factiverse/factIR.
Related papers
- Face the Facts! Evaluating RAG-based Fact-checking Pipelines in Realistic Settings [14.355271969637139]
This work lifts several constraints of current state-of-the-art pipelines for automated fact-checking based on the Retrieval-Augmented Generation paradigm.
Our goal is to benchmark, under more realistic scenarios, RAG-based methods for the generation of verdicts.
arXiv Detail & Related papers (2024-12-19T18:57:11Z) - Contrastive Learning to Improve Retrieval for Real-world Fact Checking [84.57583869042791]
We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for fact-checking complex claims.
We leverage the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents.
We find a 6% improvement in veracity classification accuracy on the dataset.
arXiv Detail & Related papers (2024-10-07T00:09:50Z) - Retrieval Augmented Fact Verification by Synthesizing Contrastive Arguments [23.639378586798884]
We propose retrieval augmented fact verification through the synthesis of contrasting arguments.
Our method effectively retrieves relevant documents as evidence and evaluates arguments from varying perspectives.
We demonstrate the effectiveness of our method through extensive experiments, where RAFTS can outperform GPT-based methods with a significantly smaller 7B LLM.
arXiv Detail & Related papers (2024-06-14T08:13:34Z) - Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain.
We propose an adversarial algorithm to make the retriever component robust against distribution shift.
We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z) - QuanTemp: A real-world open-domain benchmark for fact-checking numerical claims [4.874071145951159]
We release QuanTemp, a dataset focused exclusively on numerical claims.
We evaluate and quantify the limitations of existing solutions for the task of verifying numerical claims.
arXiv Detail & Related papers (2024-03-25T20:36:03Z) - Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers [121.53749383203792]
We present a holistic end-to-end solution for annotating the factuality of large language models (LLMs)-generated responses.
We construct an open-domain document-level factuality benchmark in three-level granularity: claim, sentence and document.
Preliminary experiments show that FacTool, FactScore and Perplexity are struggling to identify false claims.
arXiv Detail & Related papers (2023-11-15T14:41:57Z) - Complex Claim Verification with Evidence Retrieved in the Wild [73.19998942259073]
We present the first fully automated pipeline to check real-world claims by retrieving raw evidence from the web.
Our pipeline includes five components: claim decomposition, raw document retrieval, fine-grained evidence retrieval, claim-focused summarization, and veracity judgment.
arXiv Detail & Related papers (2023-05-19T17:49:19Z) - WiCE: Real-World Entailment for Claims in Wikipedia [63.234352061821625]
We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia.
In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim.
We show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.
arXiv Detail & Related papers (2023-03-02T17:45:32Z) - Generating Fact Checking Explanations [52.879658637466605]
A crucial piece of the puzzle that is still missing is to understand how to automate the most elaborate part of the process.
This paper provides the first study of how these explanations can be generated automatically based on available claim context.
Our results indicate that optimising both objectives at the same time, rather than training them separately, improves the performance of a fact checking system.
arXiv Detail & Related papers (2020-04-13T05:23:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.