Debating Truth: Debate-driven Claim Verification with Multiple Large Language Model Agents
- URL: http://arxiv.org/abs/2507.19090v1
- Date: Fri, 25 Jul 2025 09:19:25 GMT
- Title: Debating Truth: Debate-driven Claim Verification with Multiple Large Language Model Agents
- Authors: Haorui He, Yupeng Li, Dacheng Wen, Reynold Cheng, Francis C. M. Lau,
- Abstract summary: We propose DebateCV, the first claim verification framework that adopts a debate-driven methodology using multiple LLM agents.<n>In our framework, two Debaters take opposing stances on a claim and engage in multi-round argumentation, while a Moderator evaluates the arguments and renders a verdict with justifications.<n> Experimental results show that our method outperforms existing claim verification methods under varying levels of evidence quality.
- Score: 13.626715532559079
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Claim verification is critical for enhancing digital literacy. However, the state-of-the-art single-LLM methods struggle with complex claim verification that involves multi-faceted evidences. Inspired by real-world fact-checking practices, we propose DebateCV, the first claim verification framework that adopts a debate-driven methodology using multiple LLM agents. In our framework, two Debaters take opposing stances on a claim and engage in multi-round argumentation, while a Moderator evaluates the arguments and renders a verdict with justifications. To further improve the performance of the Moderator, we introduce a novel post-training strategy that leverages synthetic debate data generated by the zero-shot DebateCV, effectively addressing the scarcity of real-world debate-driven claim verification data. Experimental results show that our method outperforms existing claim verification methods under varying levels of evidence quality. Our code and dataset are publicly available at https://anonymous.4open.science/r/DebateCV-6781.
Related papers
- Debate-to-Detect: Reformulating Misinformation Detection as a Real-World Debate with Large Language Models [0.8302146576157498]
We introduce Debate-to-Detect (D2D), a novel Multi-Agent Debate (MAD) framework that reformulates misinformation detection as a structured adversarial debate.<n>Inspired by fact-checking, D2D assigns domain-specific profiles to each agent and orchestrates a five-stage debate process, including Opening Statement, Rebuttal, Free Debate, Closing Statement, and Judgment.<n> Experiments with GPT-4o on two fakenews datasets demonstrate significant improvements over baseline methods.
arXiv Detail & Related papers (2025-05-24T08:44:33Z) - CRAVE: A Conflicting Reasoning Approach for Explainable Claim Verification Using LLMs [15.170312674645535]
CRAVE is a Conflicting Reasoning Approach for explainable claim VErification.<n>It can verify complex claims based on the conflicting rationales reasoned by large language models.<n>CRAVE achieves much better performance than state-of-the-art methods.
arXiv Detail & Related papers (2025-04-21T07:20:31Z) - Reasoning Court: Combining Reasoning, Action, and Judgment for Multi-Hop Reasoning [17.829990749622496]
Reasoning Court (RC) is a novel framework that extends iterative reasoning-and-retrieval methods, such as ReAct, with a dedicated LLM judge.<n>RC consistently outperforms state-of-the-art few-shot prompting methods without task-specific fine-tuning.
arXiv Detail & Related papers (2025-04-14T00:56:08Z) - Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning [8.800516398660069]
Multiagent collaboration has emerged as a promising framework for enhancing the reasoning capabilities of large language models (LLMs)<n>We propose Debate Only When Necessary (DOWN), an adaptive multiagent debate framework that selectively activates debate based on the confidence score of the agent's initial response.<n>Down improves efficiency by up to six times while preserving or even outperforming the performance of existing methods.
arXiv Detail & Related papers (2025-04-07T13:17:52Z) - Breaking Event Rumor Detection via Stance-Separated Multi-Agent Debate [21.342632695285364]
Leveraging large language models (LLMs) for rumor detection holds significant promise.<n>We propose the Stance Separated Multi-Agent Debate (S2MAD) to address this issue.<n>Our proposed model outperforms state-of-the-art methods in terms of performance.
arXiv Detail & Related papers (2024-12-06T08:52:30Z) - Contrastive Learning to Improve Retrieval for Real-world Fact Checking [84.57583869042791]
We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for fact-checking complex claims.
We leverage the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents.
We find a 6% improvement in veracity classification accuracy on the dataset.
arXiv Detail & Related papers (2024-10-07T00:09:50Z) - Missci: Reconstructing Fallacies in Misrepresented Science [84.32990746227385]
Health-related misinformation on social networks can lead to poor decision-making and real-world dangers.
Missci is a novel argumentation theoretical model for fallacious reasoning.
We present Missci as a dataset to test the critical reasoning abilities of large language models.
arXiv Detail & Related papers (2024-06-05T12:11:10Z) - Argue with Me Tersely: Towards Sentence-Level Counter-Argument
Generation [62.069374456021016]
We present the ArgTersely benchmark for sentence-level counter-argument generation.
We also propose Arg-LlaMA for generating high-quality counter-argument.
arXiv Detail & Related papers (2023-12-21T06:51:34Z) - From Chaos to Clarity: Claim Normalization to Empower Fact-Checking [57.024192702939736]
Claim Normalization (aka ClaimNorm) aims to decompose complex and noisy social media posts into more straightforward and understandable forms.
We propose CACN, a pioneering approach that leverages chain-of-thought and claim check-worthiness estimation.
Our experiments demonstrate that CACN outperforms several baselines across various evaluation measures.
arXiv Detail & Related papers (2023-10-22T16:07:06Z) - Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate [85.3444184685235]
We propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution.
Our framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation.
arXiv Detail & Related papers (2023-05-30T15:25:45Z) - WiCE: Real-World Entailment for Claims in Wikipedia [63.234352061821625]
We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia.
In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim.
We show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.
arXiv Detail & Related papers (2023-03-02T17:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.