Reasoning-CV: Fine-tuning Powerful Reasoning LLMs for Knowledge-Assisted Claim Verification
- URL: http://arxiv.org/abs/2505.12348v1
- Date: Sun, 18 May 2025 10:28:54 GMT
- Title: Reasoning-CV: Fine-tuning Powerful Reasoning LLMs for Knowledge-Assisted Claim Verification
- Authors: Zhi Zheng, Wee Sun Lee,
- Abstract summary: Chain-of-Thought (CoT)-Verify paradigm generates CoT-verification paths for the original complex claim without requiring decompositions into sub-claims and separate verification stages.<n> Reasoning-CV demonstrates superior knowledge-assisted claim verification performances compared to existing Decompose-Then-Verify methods.
- Score: 17.35114345065597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Claim verification is essential in combating misinformation, and large language models (LLMs) have recently emerged in this area as powerful tools for assessing the veracity of claims using external knowledge. Existing LLM-based methods for claim verification typically adopt a Decompose-Then-Verify paradigm, which involves decomposing complex claims into several independent sub-claims and verifying each sub-claim separately. However, this paradigm often introduces errors during the claim decomposition process. To mitigate these errors, we propose to develop the Chain-of-Thought (CoT)-Verify paradigm, which leverages LLM reasoning methods to generate CoT-verification paths for the original complex claim without requiring decompositions into sub-claims and separate verification stages. The CoT-Verify paradigm allows us to propose a natural fine-tuning method called Reasoning-CV to enhance the verification capabilities in LLMs. Reasoning-CV includes a supervised fine-tuning (SFT) stage and a self-improvement direct preference optimization (DPO) stage. Utilizing only an 8B pre-trained LLM, Reasoning-CV demonstrates superior knowledge-assisted claim verification performances compared to existing Decompose-Then-Verify methods, as well as powerful black-box LLMs such as GPT-4o+CoT and o1-preview. Our code is available.
Related papers
- CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward [50.97588334916863]
We develop CompassVerifier, an accurate and robust lightweight verifier model for evaluation and outcome reward.<n>It demonstrates multi-domain competency spanning math, knowledge, and diverse reasoning tasks, with the capability to process various answer types.<n>We introduce VerifierBench benchmark comprising model outputs collected from multiple data sources, augmented through manual analysis of metaerror patterns to enhance CompassVerifier.
arXiv Detail & Related papers (2025-08-05T17:55:24Z) - MArgE: Meshing Argumentative Evidence from Multiple Large Language Models for Justifiable Claim Verification [12.449402503089164]
We introduce MArgE, a novel framework to provide formal structure to the evidence from each large language model.<n>We show experimentally that MArgE can significantly outperform single LLMs.
arXiv Detail & Related papers (2025-08-04T16:40:02Z) - PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier [18.771754895027616]
Policy as Generative Verifier (PAG) is a framework that empowers Large Language Models to self-correct by alternating between policy and verifier roles.<n>It alleviates model collapse and jointly enhances both reasoning and verification abilities.
arXiv Detail & Related papers (2025-06-12T06:59:35Z) - CLATTER: Comprehensive Entailment Reasoning for Hallucination Detection [60.98964268961243]
We propose that guiding models to perform a systematic and comprehensive reasoning process allows models to execute much finer-grained and accurate entailment decisions.<n>We define a 3-step reasoning process, consisting of (i) claim decomposition, (ii) sub-claim attribution and entailment classification, and (iii) aggregated classification, showing that such guided reasoning indeed yields improved hallucination detection.
arXiv Detail & Related papers (2025-06-05T17:02:52Z) - CRAVE: A Conflicting Reasoning Approach for Explainable Claim Verification Using LLMs [15.170312674645535]
CRAVE is a Conflicting Reasoning Approach for explainable claim VErification.<n>It can verify complex claims based on the conflicting rationales reasoned by large language models.<n>CRAVE achieves much better performance than state-of-the-art methods.
arXiv Detail & Related papers (2025-04-21T07:20:31Z) - Step-by-Step Fact Verification System for Medical Claims with Explainable Reasoning [5.065947993017158]
In this work, we apply an iterative FV system on three medical fact-checking datasets and evaluate it with multiple settings.<n>We demonstrate improvements in the final performance over traditional approaches and the high potential of step-by-step FV systems for domain-specific claims.
arXiv Detail & Related papers (2025-02-20T17:40:21Z) - Towards Automated Fact-Checking of Real-World Claims: Exploring Task Formulation and Assessment with LLMs [32.45604456988931]
This study establishes baseline comparisons for Automated Fact-Checking (AFC) using Large Language Models (LLMs)<n>We evaluate Llama-3 models of varying sizes on 17,856 claims collected from PolitiFact (2007-2024) using evidence retrieved via restricted web searches.<n>Our results show that larger LLMs consistently outperform smaller LLMs in classification accuracy and justification quality without fine-tuning.
arXiv Detail & Related papers (2025-02-13T02:51:17Z) - Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning.
LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors.
We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z) - Retrieval Augmented Fact Verification by Synthesizing Contrastive Arguments [23.639378586798884]
We propose retrieval augmented fact verification through the synthesis of contrasting arguments.
Our method effectively retrieves relevant documents as evidence and evaluates arguments from varying perspectives.
We demonstrate the effectiveness of our method through extensive experiments, where RAFTS can outperform GPT-based methods with a significantly smaller 7B LLM.
arXiv Detail & Related papers (2024-06-14T08:13:34Z) - CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation [76.31621715032558]
Grounded generation aims to equip language models (LMs) with the ability to produce more credible and accountable responses.
We introduce CaLM, a novel verification framework.
Our framework empowers smaller LMs, which rely less on parametric memory, to validate the output of larger LMs.
arXiv Detail & Related papers (2024-06-08T06:04:55Z) - Causal Prompting: Debiasing Large Language Model Prompting based on Front-Door Adjustment [32.12998469814097]
A novel causal prompting method based on front-door adjustment is proposed to effectively mitigate Large Language Models (LLMs) biases.<n> Experimental results show that the proposed causal prompting approach achieves excellent performance across seven natural language processing datasets.
arXiv Detail & Related papers (2024-03-05T07:47:34Z) - Mitigating Large Language Model Hallucinations via Autonomous Knowledge
Graph-based Retrofitting [51.7049140329611]
This paper proposes Knowledge Graph-based Retrofitting (KGR) to mitigate factual hallucination during the reasoning process.
Experiments show that KGR can significantly improve the performance of LLMs on factual QA benchmarks.
arXiv Detail & Related papers (2023-11-22T11:08:38Z) - From Chaos to Clarity: Claim Normalization to Empower Fact-Checking [57.024192702939736]
Claim Normalization (aka ClaimNorm) aims to decompose complex and noisy social media posts into more straightforward and understandable forms.
We propose CACN, a pioneering approach that leverages chain-of-thought and claim check-worthiness estimation.
Our experiments demonstrate that CACN outperforms several baselines across various evaluation measures.
arXiv Detail & Related papers (2023-10-22T16:07:06Z) - RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by
Reversing Chain-of-Thought [56.558892336235914]
Reversing Chain-of-Thought (RCoT) is a novel method to improve large language models' reasoning abilities.
RCoT automatically detects and rectifys factual inconsistency in generated solutions.
We show that manually written fine-grained feedback can dramatically improve LLMs' reasoning abilities.
arXiv Detail & Related papers (2023-05-19T08:02:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.