Distill and Align Decomposition for Enhanced Claim Verification
- URL: http://arxiv.org/abs/2602.21857v1
- Date: Wed, 25 Feb 2026 12:32:04 GMT
- Title: Distill and Align Decomposition for Enhanced Claim Verification
- Authors: Jabez Magomere, Elena Kochkina, Samuel Mensah, Simerjot Kaur, Fernando Acero, Arturo Oncevay, Charese H. Smiley, Xiaomo Liu, Manuela Veloso,
- Abstract summary: Complex claim verification requires decomposing sentences into verifiable subclaims.<n>We propose a reinforcement learning approach that optimises decomposition quality and verifier alignment.<n>Our framework enables smaller language models to achieve state-of-the-art claim verification.
- Score: 51.93960785128124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Complex claim verification requires decomposing sentences into verifiable subclaims, yet existing methods struggle to align decomposition quality with verification performance. We propose a reinforcement learning (RL) approach that jointly optimizes decomposition quality and verifier alignment using Group Relative Policy Optimization (GRPO). Our method integrates: (i) structured sequential reasoning; (ii) supervised finetuning on teacher-distilled exemplars; and (iii) a multi-objective reward balancing format compliance, verifier alignment, and decomposition quality. Across six evaluation settings, our trained 8B decomposer improves downstream verification performance to (71.75%) macro-F1, outperforming prompt-based approaches ((+1.99), (+6.24)) and existing RL methods ((+5.84)). Human evaluation confirms the high quality of the generated subclaims. Our framework enables smaller language models to achieve state-of-the-art claim verification by jointly optimising for verification accuracy and decomposition quality.
Related papers
- PRIME: A Process-Outcome Alignment Benchmark for Verifiable Reasoning in Mathematics and Engineering [71.15346406323827]
We introduce PRIME, a benchmark for evaluating verifiers on Process-Outcome Alignment verification.<n>We find that current verifiers frequently fail to detect derivation flaws.<n>We propose a process-aware RLVR training paradigm utilizing verifiers selected via PRIME.
arXiv Detail & Related papers (2026-02-12T04:45:01Z) - Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner Collaboration [49.9937230730202]
We propose Search-R2, a novel Actor-Refiner collaboration framework that enhances reasoning through targeted intervention.<n>Our approach decomposes the generation process into an Actor, which produces initial reasoning trajectories.<n>We show that Search-R2 consistently outperforms strong RAG and RL-based baselines across model scales.
arXiv Detail & Related papers (2026-02-03T15:32:09Z) - Evidence-Augmented Policy Optimization with Reward Co-Evolution for Long-Context Reasoning [52.144281362465996]
We propose EAPO (Evidence-Augmented Policy Optimization) to apply Reinforcement Learning to long-context scenarios.<n>We first establish the Evidence-Augmented Reasoning paradigm, validating via Tree-Structured Evidence Sampling.<n>We then introduce a specialized RL algorithm where a reward model computes a Group-Relative Evidence Reward.<n>To sustain accurate supervision throughout training, we further incorporate an Adaptive Reward-Policy Co-Evolution mechanism.
arXiv Detail & Related papers (2026-01-15T11:40:57Z) - Unified Generation and Self-Verification for Vision-Language Models via Advantage Decoupled Preference Optimization [48.078132893679744]
We propose a unified reinforcement learning framework that jointly learns answer generation and self-verification within a single policy.<n>ADPO achieves up to +34.1% higher verification AUC and -53.5% lower inference time, with significant gains of +2.8%/+1.4% accuracy on MathVista/MMMU, +1.9 cIoU on ReasonSeg, and +1.7%/+1.0% step success rate on AndroidControl/GUI Odyssey.
arXiv Detail & Related papers (2026-01-04T11:09:33Z) - Formal Models and Convergence Analysis for Context-Aware Security Verification [0.0]
We present a formal framework for context-aware security verification that establishes provable guarantees for ML-enhanced adaptive systems.<n>We introduce context-completeness - a new security property - and prove: (1) sample complexity bounds showing when adaptive verification succeeds, (2) information-theoretic limits relating context richness to detection capability, and (3) convergence guarantees for ML-based payload generators.
arXiv Detail & Related papers (2025-10-14T12:21:36Z) - Perception-Consistency Multimodal Large Language Models Reasoning via Caption-Regularized Policy Optimization [72.30168853571216]
multimodal large language models excel at tasks that integrate visual perception with symbolic reasoning.<n>CapPO integrates two key mechanisms: (1) a caption-based consistency regularization, which minimizes the divergence between responses conditioned on raw images and those conditioned on captions, and (2) a KL-weighted advantage estimation scheme, which adaptively scales reinforcement signals to strengthen perceptually consistent trajectories.
arXiv Detail & Related papers (2025-09-26T04:32:26Z) - DecMetrics: Structured Claim Decomposition Scoring for Factually Consistent LLM Outputs [0.609170287691728]
We introduce textbfDecMetrics, which comprises three new metrics: textttCOMPLETENESS, textttCORRECTNESS, and textttSEMANTIC ENTROPY, designed to automatically assess the quality of claims produced by decomposition models.<n>Our approach aims to set a benchmark for claim decomposition, enhancing both the reliability and effectiveness of fact-checking systems.
arXiv Detail & Related papers (2025-08-31T10:22:54Z) - AssertCoder: LLM-Based Assertion Generation via Multimodal Specification Extraction [32.14733357890831]
We propose AssertCoder, a novel unified framework that automatically generates high-quality SVAs.<n>AssertCoder employs a modality-sensitive preprocessing to parse heterogeneous specification formats.<n>The framework incorporates a mutation-based evaluation approach to assess assertion quality.
arXiv Detail & Related papers (2025-07-14T14:43:14Z) - Reasoning-CV: Fine-tuning Powerful Reasoning LLMs for Knowledge-Assisted Claim Verification [17.35114345065597]
Chain-of-Thought (CoT)-Verify paradigm generates CoT-verification paths for the original complex claim without requiring decompositions into sub-claims and separate verification stages.<n> Reasoning-CV demonstrates superior knowledge-assisted claim verification performances compared to existing Decompose-Then-Verify methods.
arXiv Detail & Related papers (2025-05-18T10:28:54Z) - Optimizing Decomposition for Optimal Claim Verification [15.68967195914405]
We find that existing decomposition policies, typically hand-crafted demonstrations, do not align well with downstream verifiers in terms of atomicity.<n>We propose dynamic decomposition, a reinforcement learning framework that leverages verifier feedback to learn a policy for dynamically decomposing claims to verifier-preferred atomicity.<n> Experimental results show that dynamic decomposition outperforms existing decomposition policies, improving verification confidence by 0.07 and accuracy by 0.12 on average across varying verifiers, datasets, and atomcities of input claims.
arXiv Detail & Related papers (2025-03-19T15:56:21Z) - In-context Demonstration Matters: On Prompt Optimization for Pseudo-Supervision Refinement [71.60563181678323]
Large language models (LLMs) have achieved great success across diverse tasks, and fine-tuning is sometimes needed to further enhance generation quality.<n>To handle these challenges, a direct solution is to generate high-confidence'' data from unsupervised downstream tasks.<n>We propose a novel approach, pseudo-supervised demonstrations aligned prompt optimization (PAPO) algorithm, which jointly refines both the prompt and the overall pseudo-supervision.
arXiv Detail & Related papers (2024-10-04T03:39:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.