Retrieve-Refine-Calibrate: A Framework for Complex Claim Fact-Checking
- URL: http://arxiv.org/abs/2601.16555v1
- Date: Fri, 23 Jan 2026 08:48:52 GMT
- Title: Retrieve-Refine-Calibrate: A Framework for Complex Claim Fact-Checking
- Authors: Mingwei Sun, Qianlong Wang, Ruifeng Xu,
- Abstract summary: We propose a Retrieve-Refine-Calibrate (RRC) framework based on large language models (LLMs)<n>Specifically, the framework first identifies the entities mentioned in the claim and retrieves evidence relevant to them.<n>Then, it refines the retrieved evidence based on the claim to reduce irrelevant information.<n>Finally, it calibrates the verification process by re-evaluating low-confidence predictions.
- Score: 32.6738019397553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fact-checking aims to verify the truthfulness of a claim based on the retrieved evidence. Existing methods typically follow a decomposition paradigm, in which a claim is broken down into sub-claims that are individually verified. However, the decomposition paradigm may introduce noise to the verification process due to irrelevant entities or evidence, ultimately degrading verification accuracy. To address this problem, we propose a Retrieve-Refine-Calibrate (RRC) framework based on large language models (LLMs). Specifically, the framework first identifies the entities mentioned in the claim and retrieves evidence relevant to them. Then, it refines the retrieved evidence based on the claim to reduce irrelevant information. Finally, it calibrates the verification process by re-evaluating low-confidence predictions. Experiments on two popular fact-checking datasets (HOVER and FEVEROUS-S) demonstrate that our framework achieves superior performance compared with competitive baselines.
Related papers
- Multi-Sourced, Multi-Agent Evidence Retrieval for Fact-Checking [47.47518672198846]
Misinformation spreading over the Internet poses a significant threat to both societies and individuals.<n>Previous methods rely on semantic and social-contextual patterns learned from training data.<n>We propose WKGFC, which exploits authorized open knowledge graph as a core resource of evidence.
arXiv Detail & Related papers (2026-02-27T19:29:01Z) - The Alignment Bottleneck in Decomposition-Based Claim Verification [17.197804072440665]
We introduce a new dataset of real-world complex claims featuring temporally bounded evidence and human-annotated sub-claim evidence spans.<n>We evaluate decomposition under two evidence alignment setups: Sub-claim Aligned Evidence (SAE) and Repeated Claim-level Evidence (SRE)<n>Our results reveal that decomposition brings significant performance improvement only when evidence is granular and strictly aligned.
arXiv Detail & Related papers (2026-02-11T00:02:16Z) - A Benchmark for Open-Domain Numerical Fact-Checking Enhanced by Claim Decomposition [7.910984819642885]
QuanTemp++ is a dataset consisting of natural numerical claims, an open domain corpus, with the corresponding relevant evidence for each claim.<n>We characterize the retrieval performance of key claim decomposition paradigms.
arXiv Detail & Related papers (2025-10-24T22:37:13Z) - SUCEA: Reasoning-Intensive Retrieval for Adversarial Fact-checking through Claim Decomposition and Editing [30.84752573088322]
adversarial claims are intentionally designed by humans to challenge fact-checking systems.<n>We propose a training-free method designed to rephrase the original claim, making it easier to locate supporting evidence.<n>Our framework significantly improves on both retrieval and entailment label accuracy, outperforming four strong claim-decomposition-based baselines.
arXiv Detail & Related papers (2025-06-05T02:58:15Z) - CRAVE: A Conflicting Reasoning Approach for Explainable Claim Verification Using LLMs [15.170312674645535]
CRAVE is a Conflicting Reasoning Approach for explainable claim VErification.<n>It can verify complex claims based on the conflicting rationales reasoned by large language models.<n>CRAVE achieves much better performance than state-of-the-art methods.
arXiv Detail & Related papers (2025-04-21T07:20:31Z) - Contrastive Learning to Improve Retrieval for Real-world Fact Checking [84.57583869042791]
We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for fact-checking complex claims.
We leverage the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents.
We find a 6% improvement in veracity classification accuracy on the dataset.
arXiv Detail & Related papers (2024-10-07T00:09:50Z) - From Relevance to Utility: Evidence Retrieval with Feedback for Fact Verification [118.03466985807331]
We argue that, rather than relevance, for FV we need to focus on the utility that a claim verifier derives from the retrieved evidence.<n>We introduce the feedback-based evidence retriever(FER) that optimize the evidence retrieval process by incorporating feedback from the claim verifier.
arXiv Detail & Related papers (2023-10-18T02:59:38Z) - Read it Twice: Towards Faithfully Interpretable Fact Verification by
Revisiting Evidence [59.81749318292707]
We propose a fact verification model named ReRead to retrieve evidence and verify claim.
The proposed system is able to achieve significant improvements upon best-reported models under different settings.
arXiv Detail & Related papers (2023-05-02T03:23:14Z) - WiCE: Real-World Entailment for Claims in Wikipedia [63.234352061821625]
We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia.
In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim.
We show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.
arXiv Detail & Related papers (2023-03-02T17:45:32Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - AmbiFC: Fact-Checking Ambiguous Claims with Evidence [57.7091560922174]
We present AmbiFC, a fact-checking dataset with 10k claims derived from real-world information needs.
We analyze disagreements arising from ambiguity when comparing claims against evidence in AmbiFC.
We develop models for predicting veracity handling this ambiguity via soft labels.
arXiv Detail & Related papers (2021-04-01T17:40:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.