Related papers: Augmenting the Veracity and Explanations of Complex Fact Checking via Iterative Self-Revision with LLMs

Related papers

Hallucination to Truth: A Review of Fact-Checking and Factuality Evaluation in Large Language Models [1.0138329337410974]
Large Language Models (LLMs) are trained on vast and diverse internet corpora that often include inaccurate or misleading content.<n>This review systematically analyzes how LLM-generated content is evaluated for factual accuracy.
arXiv Detail & Related papers (2025-08-05T19:20:05Z)
FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models [79.41859481668618]
Large Language Models (LLMs) have significantly advanced the fact-checking studies.<n>Existing automated fact-checking evaluation methods rely on static datasets and classification metrics.<n>We introduce FACT-AUDIT, an agent-driven framework that adaptively and dynamically assesses LLMs' fact-checking capabilities.
arXiv Detail & Related papers (2025-02-25T07:44:22Z)
ZeFaV: Boosting Large Language Models for Zero-shot Fact Verification [2.6874004806796523]
ZeFaV is a zero-shot based fact-checking verification framework to enhance the performance on fact verification task of large language models. We conducted empirical experiments to evaluate our approach on two multi-hop fact-checking datasets including HoVer and FEVEROUS.
arXiv Detail & Related papers (2024-11-18T02:35:15Z)
FactLens: Benchmarking Fine-Grained Fact Verification [6.814173254027381]
We advocate for a shift toward fine-grained verification, where complex claims are broken down into smaller sub-claims for individual verification. We introduce FactLens, a benchmark for evaluating fine-grained fact verification, with metrics and automated evaluators of sub-claim quality. Our results show alignment between automated FactLens evaluators and human judgments, and we discuss the impact of sub-claim characteristics on the overall verification performance.
arXiv Detail & Related papers (2024-11-08T21:26:57Z)
FactBench: A Dynamic Benchmark for In-the-Wild Language Model Factuality Evaluation [4.773086022844023]
We present VERIFY, a pipeline to evaluate language models' factuality in real-world user interactions. We show that VERIFY correlates better with human evaluations than existing methods. We benchmark widely-used LMs from GPT, Gemini, and Llama families on FACTBENCH.
arXiv Detail & Related papers (2024-10-29T17:19:56Z)
Robust Claim Verification Through Fact Detection [17.29665711917281]
Our novel approach, FactDetect, leverages Large Language Models (LLMs) to generate concise factual statements from evidence. The generated facts are then combined with the claim and evidence. Our method demonstrates competitive results in the supervised claim verification model by 15% on the F1 score.
arXiv Detail & Related papers (2024-07-25T20:03:43Z)
How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models [95.44559524735308]
Large language or multimodal model based verification has been proposed to scale up online policing mechanisms for mitigating spread of false and harmful content. We test the limits of improving foundation model performance without continual updating through an initial study of knowledge transfer. Our results on two recent multi-modal fact-checking benchmarks, Mocheg and Fakeddit, indicate that knowledge transfer strategies can improve Fakeddit performance over the state-of-the-art by up to 1.7% and Mocheg performance by up to 2.9%.
arXiv Detail & Related papers (2024-06-29T08:39:07Z)
Self-Distilled Disentangled Learning for Counterfactual Prediction [49.84163147971955]
We propose the Self-Distilled Disentanglement framework, known as $SD2$. Grounded in information theory, it ensures theoretically sound independent disentangled representations without intricate mutual information estimator designs. Our experiments, conducted on both synthetic and real-world datasets, confirm the effectiveness of our approach.
arXiv Detail & Related papers (2024-06-09T16:58:19Z)
RU22Fact: Optimizing Evidence for Multilingual Explainable Fact-Checking on Russia-Ukraine Conflict [34.2739191920746]
High-quality evidence plays a vital role in enhancing fact-checking systems. We propose a method based on a Large Language Model to automatically retrieve and summarize evidence from the Web. We construct RU22Fact, a novel explainable fact-checking dataset on the Russia-Ukraine conflict in 2022 of 16K samples.
arXiv Detail & Related papers (2024-03-25T11:56:29Z)
FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction [85.26780391682894]
We propose Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction (FENICE) FENICE leverages an NLI-based alignment between information in the source document and a set of atomic facts, referred to as claims, extracted from the summary. Our metric sets a new state of the art on AGGREFACT, the de-facto benchmark for factuality evaluation.
arXiv Detail & Related papers (2024-03-04T17:57:18Z)
AFaCTA: Assisting the Annotation of Factual Claim Detection with Reliable LLM Annotators [38.523194864405326]
AFaCTA is a novel framework that assists in the annotation of factual claims. AFaCTA calibrates its annotation confidence with consistency along three predefined reasoning paths. Our analyses also result in PoliClaim, a comprehensive claim detection dataset spanning diverse political topics.
arXiv Detail & Related papers (2024-02-16T20:59:57Z)
Can LLMs Produce Faithful Explanations For Fact-checking? Towards Faithful Explainable Fact-Checking via Multi-Agent Debate [75.10515686215177]
Large Language Models (LLMs) excel in text generation, but their capability for producing faithful explanations in fact-checking remains underexamined. We propose the Multi-Agent Debate Refinement (MADR) framework, leveraging multiple LLMs as agents with diverse roles. MADR ensures that the final explanation undergoes rigorous validation, significantly reducing the likelihood of unfaithful elements and aligning closely with the provided evidence.
arXiv Detail & Related papers (2024-02-12T04:32:33Z)
How We Refute Claims: Automatic Fact-Checking through Flaw Identification and Explanation [4.376598435975689]
This paper explores the novel task of flaw-oriented fact-checking, including aspect generation and flaw identification. We also introduce RefuteClaim, a new framework designed specifically for this task. Given the absence of an existing dataset, we present FlawCheck, a dataset created by extracting and transforming insights from expert reviews into relevant aspects and identified flaws.
arXiv Detail & Related papers (2024-01-27T06:06:16Z)
Evidence-based Interpretable Open-domain Fact-checking with Large Language Models [26.89527395822654]
We introduce the Open-domain Explainable Fact-checking (OE-Fact) system for claim-checking in real-world scenarios. The OE-Fact system can leverage the powerful understanding and reasoning capabilities of large language models (LLMs) to validate claims. Experimental results show that our OE-Fact system outperforms general fact-checking baseline systems in both closed- and open-domain scenarios.
arXiv Detail & Related papers (2023-12-10T09:27:50Z)
From Chaos to Clarity: Claim Normalization to Empower Fact-Checking [57.024192702939736]
Claim Normalization (aka ClaimNorm) aims to decompose complex and noisy social media posts into more straightforward and understandable forms. We propose CACN, a pioneering approach that leverages chain-of-thought and claim check-worthiness estimation. Our experiments demonstrate that CACN outperforms several baselines across various evaluation measures.
arXiv Detail & Related papers (2023-10-22T16:07:06Z)
FactCHD: Benchmarking Fact-Conflicting Hallucination Detection [64.4610684475899]
FactCHD is a benchmark designed for the detection of fact-conflicting hallucinations from LLMs. FactCHD features a diverse dataset that spans various factuality patterns, including vanilla, multi-hop, comparison, and set operation. We introduce Truth-Triangulator that synthesizes reflective considerations by tool-enhanced ChatGPT and LoRA-tuning based on Llama2.
arXiv Detail & Related papers (2023-10-18T16:27:49Z)
EX-FEVER: A Dataset for Multi-hop Explainable Fact Verification [22.785622371421876]
We present a pioneering dataset for multi-hop explainable fact verification. With over 60,000 claims involving 2-hop and 3-hop reasoning, each is created by summarizing and modifying information from hyperlinked Wikipedia documents. We demonstrate a novel baseline system on our EX-FEVER dataset, showcasing document retrieval, explanation generation, and claim verification.
arXiv Detail & Related papers (2023-10-15T06:46:15Z)
Goodhart's Law Applies to NLP's Explanation Benchmarks [57.26445915212884]
We critically examine two sets of metrics: the ERASER metrics (comprehensiveness and sufficiency) and the EVAL-X metrics. We show that we can inflate a model's comprehensiveness and sufficiency scores dramatically without altering its predictions or explanations on in-distribution test inputs. Our results raise doubts about the ability of current metrics to guide explainability research, underscoring the need for a broader reassessment of what precisely these metrics are intended to capture.
arXiv Detail & Related papers (2023-08-28T03:03:03Z)
Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks. The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data. Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z)
WiCE: Real-World Entailment for Claims in Wikipedia [63.234352061821625]
We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia. In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim. We show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.
arXiv Detail & Related papers (2023-03-02T17:45:32Z)
DeSePtion: Dual Sequence Prediction and Adversarial Examples for Improved Fact-Checking [46.13738685855884]
We show that current systems for fact-checking are vulnerable to three categories of realistic challenges for fact-checking. We present a system designed to be resilient to these "attacks" using multiple pointer networks for document selection. We find that in handling these attacks we obtain state-of-the-art results on FEVER, largely due to improved evidence retrieval.
arXiv Detail & Related papers (2020-04-27T15:18:49Z)
Generating Fact Checking Explanations [52.879658637466605]
A crucial piece of the puzzle that is still missing is to understand how to automate the most elaborate part of the process. This paper provides the first study of how these explanations can be generated automatically based on available claim context. Our results indicate that optimising both objectives at the same time, rather than training them separately, improves the performance of a fact checking system.
arXiv Detail & Related papers (2020-04-13T05:23:25Z)
Enhancing Factual Consistency of Abstractive Summarization [57.67609672082137]
We propose a fact-aware summarization model FASum to extract and integrate factual relations into the summary generation process. We then design a factual corrector model FC to automatically correct factual errors from summaries generated by existing systems.
arXiv Detail & Related papers (2020-03-19T07:36:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.