Related papers: A Benchmark for Open-Domain Numerical Fact-Checking Enhanced by Claim Decomposition

A Benchmark for Open-Domain Numerical Fact-Checking Enhanced by Claim Decomposition

URL: http://arxiv.org/abs/2510.22055v1
Date: Fri, 24 Oct 2025 22:37:13 GMT
Title: A Benchmark for Open-Domain Numerical Fact-Checking Enhanced by Claim Decomposition
Authors: V Venktesh, Deepali Prabhu, Avishek Anand,
Abstract summary: QuanTemp++ is a dataset consisting of natural numerical claims, an open domain corpus, with the corresponding relevant evidence for each claim.<n>We characterize the retrieval performance of key claim decomposition paradigms.
Score: 7.910984819642885
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Fact-checking numerical claims is critical as the presence of numbers provide mirage of veracity despite being fake potentially causing catastrophic impacts on society. The prior works in automatic fact verification do not primarily focus on natural numerical claims. A typical human fact-checker first retrieves relevant evidence addressing the different numerical aspects of the claim and then reasons about them to predict the veracity of the claim. Hence, the search process of a human fact-checker is a crucial skill that forms the foundation of the verification process. Emulating a real-world setting is essential to aid in the development of automated methods that encompass such skills. However, existing benchmarks employ heuristic claim decomposition approaches augmented with weakly supervised web search to collect evidences for verifying claims. This sometimes results in less relevant evidences and noisy sources with temporal leakage rendering a less realistic retrieval setting for claim verification. Hence, we introduce QuanTemp++: a dataset consisting of natural numerical claims, an open domain corpus, with the corresponding relevant evidence for each claim. The evidences are collected through a claim decomposition process approximately emulating the approach of human fact-checker and veracity labels ensuring there is no temporal leakage. Given this dataset, we also characterize the retrieval performance of key claim decomposition paradigms. Finally, we observe their effect on the outcome of the verification pipeline and draw insights. The code for data pipeline along with link to data can be found at https://github.com/VenkteshV/QuanTemp_Plus

Related papers

Retrieve-Refine-Calibrate: A Framework for Complex Claim Fact-Checking [32.6738019397553]
We propose a Retrieve-Refine-Calibrate (RRC) framework based on large language models (LLMs)<n>Specifically, the framework first identifies the entities mentioned in the claim and retrieves evidence relevant to them.<n>Then, it refines the retrieved evidence based on the claim to reduce irrelevant information.<n>Finally, it calibrates the verification process by re-evaluating low-confidence predictions.
arXiv Detail & Related papers (2026-01-23T08:48:52Z)
Contrastive Learning to Improve Retrieval for Real-world Fact Checking [84.57583869042791]
We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for fact-checking complex claims. We leverage the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents. We find a 6% improvement in veracity classification accuracy on the dataset.
arXiv Detail & Related papers (2024-10-07T00:09:50Z)
QuanTemp: A real-world open-domain benchmark for fact-checking numerical claims [4.874071145951159]
We release QuanTemp, a dataset focused exclusively on numerical claims. We evaluate and quantify the limitations of existing solutions for the task of verifying numerical claims.
arXiv Detail & Related papers (2024-03-25T20:36:03Z)
From Relevance to Utility: Evidence Retrieval with Feedback for Fact Verification [118.03466985807331]
We argue that, rather than relevance, for FV we need to focus on the utility that a claim verifier derives from the retrieved evidence.<n>We introduce the feedback-based evidence retriever(FER) that optimize the evidence retrieval process by incorporating feedback from the claim verifier.
arXiv Detail & Related papers (2023-10-18T02:59:38Z)
Complex Claim Verification with Evidence Retrieved in the Wild [73.19998942259073]
We present the first fully automated pipeline to check real-world claims by retrieving raw evidence from the web. Our pipeline includes five components: claim decomposition, raw document retrieval, fine-grained evidence retrieval, claim-focused summarization, and veracity judgment.
arXiv Detail & Related papers (2023-05-19T17:49:19Z)
WiCE: Real-World Entailment for Claims in Wikipedia [63.234352061821625]
We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia. In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim. We show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.
arXiv Detail & Related papers (2023-03-02T17:45:32Z)
Implicit Temporal Reasoning for Evidence-Based Fact-Checking [14.015789447347466]
Our study demonstrates that time positively influences the claim verification process of evidence-based fact-checking. Our findings show that the presence of temporal information and the manner in which timelines are constructed greatly influence how fact-checking models determine the relevance and supporting or refuting character of evidence documents.
arXiv Detail & Related papers (2023-02-24T10:48:03Z)
GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion. The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z)
AmbiFC: Fact-Checking Ambiguous Claims with Evidence [57.7091560922174]
We present AmbiFC, a fact-checking dataset with 10k claims derived from real-world information needs. We analyze disagreements arising from ambiguity when comparing claims against evidence in AmbiFC. We develop models for predicting veracity handling this ambiguity via soft labels.
arXiv Detail & Related papers (2021-04-01T17:40:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.