Related papers: The Alignment Bottleneck in Decomposition-Based Claim Verification

The Alignment Bottleneck in Decomposition-Based Claim Verification

URL: http://arxiv.org/abs/2602.10380v1
Date: Wed, 11 Feb 2026 00:02:16 GMT
Title: The Alignment Bottleneck in Decomposition-Based Claim Verification
Authors: Mahmud Elahi Akhter, Federico Ruggeri, Iman Munire Bilal, Rob Procter, Maria Liakata,
Abstract summary: We introduce a new dataset of real-world complex claims featuring temporally bounded evidence and human-annotated sub-claim evidence spans.<n>We evaluate decomposition under two evidence alignment setups: Sub-claim Aligned Evidence (SAE) and Repeated Claim-level Evidence (SRE)<n>Our results reveal that decomposition brings significant performance improvement only when evidence is granular and strictly aligned.
Score: 17.197804072440665
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Structured claim decomposition is often proposed as a solution for verifying complex, multi-faceted claims, yet empirical results have been inconsistent. We argue that these inconsistencies stem from two overlooked bottlenecks: evidence alignment and sub-claim error profiles. To better understand these factors, we introduce a new dataset of real-world complex claims, featuring temporally bounded evidence and human-annotated sub-claim evidence spans. We evaluate decomposition under two evidence alignment setups: Sub-claim Aligned Evidence (SAE) and Repeated Claim-level Evidence (SRE). Our results reveal that decomposition brings significant performance improvement only when evidence is granular and strictly aligned. By contrast, standard setups that rely on repeated claim-level evidence (SRE) fail to improve and often degrade performance as shown across different datasets and domains (PHEMEPlus, MMM-Fact, COVID-Fact). Furthermore, we demonstrate that in the presence of noisy sub-claim labels, the nature of the error ends up determining downstream robustness. We find that conservative "abstention" significantly reduces error propagation compared to aggressive but incorrect predictions. These findings suggest that future claim decomposition frameworks must prioritize precise evidence synthesis and calibrate the label bias of sub-claim verification models.

Related papers

Retrieve-Refine-Calibrate: A Framework for Complex Claim Fact-Checking [32.6738019397553]
We propose a Retrieve-Refine-Calibrate (RRC) framework based on large language models (LLMs)<n>Specifically, the framework first identifies the entities mentioned in the claim and retrieves evidence relevant to them.<n>Then, it refines the retrieved evidence based on the claim to reduce irrelevant information.<n>Finally, it calibrates the verification process by re-evaluating low-confidence predictions.
arXiv Detail & Related papers (2026-01-23T08:48:52Z)
Fact in Fragments: Deconstructing Complex Claims via LLM-based Atomic Fact Extraction and Verification [18.20994425756688]
Atomic Fact Extraction and Verification (AFEV) is a novel framework that iteratively decomposes complex claims into atomic facts.<n>AFEV achieves state-of-the-art performance in both accuracy and interpretability.
arXiv Detail & Related papers (2025-06-09T05:49:43Z)
SUCEA: Reasoning-Intensive Retrieval for Adversarial Fact-checking through Claim Decomposition and Editing [30.84752573088322]
adversarial claims are intentionally designed by humans to challenge fact-checking systems.<n>We propose a training-free method designed to rephrase the original claim, making it easier to locate supporting evidence.<n>Our framework significantly improves on both retrieval and entailment label accuracy, outperforming four strong claim-decomposition-based baselines.
arXiv Detail & Related papers (2025-06-05T02:58:15Z)
CRAVE: A Conflicting Reasoning Approach for Explainable Claim Verification Using LLMs [15.170312674645535]
CRAVE is a Conflicting Reasoning Approach for explainable claim VErification.<n>It can verify complex claims based on the conflicting rationales reasoned by large language models.<n>CRAVE achieves much better performance than state-of-the-art methods.
arXiv Detail & Related papers (2025-04-21T07:20:31Z)
Contrastive Learning to Improve Retrieval for Real-world Fact Checking [84.57583869042791]
We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for fact-checking complex claims. We leverage the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents. We find a 6% improvement in veracity classification accuracy on the dataset.
arXiv Detail & Related papers (2024-10-07T00:09:50Z)
From Relevance to Utility: Evidence Retrieval with Feedback for Fact Verification [118.03466985807331]
We argue that, rather than relevance, for FV we need to focus on the utility that a claim verifier derives from the retrieved evidence.<n>We introduce the feedback-based evidence retriever(FER) that optimize the evidence retrieval process by incorporating feedback from the claim verifier.
arXiv Detail & Related papers (2023-10-18T02:59:38Z)
WiCE: Real-World Entailment for Claims in Wikipedia [63.234352061821625]
We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia. In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim. We show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.
arXiv Detail & Related papers (2023-03-02T17:45:32Z)
GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion. The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z)
AmbiFC: Fact-Checking Ambiguous Claims with Evidence [57.7091560922174]
We present AmbiFC, a fact-checking dataset with 10k claims derived from real-world information needs. We analyze disagreements arising from ambiguity when comparing claims against evidence in AmbiFC. We develop models for predicting veracity handling this ambiguity via soft labels.
arXiv Detail & Related papers (2021-04-01T17:40:08Z)
Hierarchical Evidence Set Modeling for Automated Fact Extraction and Verification [5.836068916903788]
Hierarchical Evidence Set Modeling (HESM) is a framework to extract evidence sets and verify a claim to be supported, refuted or not enough info. Our experimental results show that HESM outperforms 7 state-of-the-art methods for fact extraction and claim verification.
arXiv Detail & Related papers (2020-10-10T22:27:17Z)
DeSePtion: Dual Sequence Prediction and Adversarial Examples for Improved Fact-Checking [46.13738685855884]
We show that current systems for fact-checking are vulnerable to three categories of realistic challenges for fact-checking. We present a system designed to be resilient to these "attacks" using multiple pointer networks for document selection. We find that in handling these attacks we obtain state-of-the-art results on FEVER, largely due to improved evidence retrieval.
arXiv Detail & Related papers (2020-04-27T15:18:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.