Towards Effective Extraction and Evaluation of Factual Claims
- URL: http://arxiv.org/abs/2502.10855v1
- Date: Sat, 15 Feb 2025 16:58:05 GMT
- Title: Towards Effective Extraction and Evaluation of Factual Claims
- Authors: Dasha Metropolitansky, Jonathan Larson,
- Abstract summary: A common strategy for fact-checking long-form content generated by Large Language Models (LLMs) is extracting simple claims that can be verified independently.
We propose a framework for evaluating claim extraction in the context of fact-checking along with automated, scalable, and replicable methods for applying this framework.
We also introduce Claimify, an LLM-based claim extraction method, and demonstrate that it outperforms existing methods under our evaluation framework.
- Score: 1.8262547855491458
- License:
- Abstract: A common strategy for fact-checking long-form content generated by Large Language Models (LLMs) is extracting simple claims that can be verified independently. Since inaccurate or incomplete claims compromise fact-checking results, ensuring claim quality is critical. However, the lack of a standardized evaluation framework impedes assessment and comparison of claim extraction methods. To address this gap, we propose a framework for evaluating claim extraction in the context of fact-checking along with automated, scalable, and replicable methods for applying this framework, including novel approaches for measuring coverage and decontextualization. We also introduce Claimify, an LLM-based claim extraction method, and demonstrate that it outperforms existing methods under our evaluation framework. A key feature of Claimify is its ability to handle ambiguity and extract claims only when there is high confidence in the correct interpretation of the source text.
Related papers
- Claim Extraction for Fact-Checking: Data, Models, and Automated Metrics [0.0]
We release the FEVERFact dataset, with 17K atomic factual claims extracted from 4K contextualised Wikipedia sentences.
For each metric, we implement a scale using a reduction to an already-explored NLP task.
We validate our metrics against human grading of generic claims, to see that the model ranking on $F_fact$, our hardest metric, did not change.
arXiv Detail & Related papers (2025-02-07T14:20:45Z) - FactLens: Benchmarking Fine-Grained Fact Verification [6.814173254027381]
We advocate for a shift toward fine-grained verification, where complex claims are broken down into smaller sub-claims for individual verification.
We introduce FactLens, a benchmark for evaluating fine-grained fact verification, with metrics and automated evaluators of sub-claim quality.
Our results show alignment between automated FactLens evaluators and human judgments, and we discuss the impact of sub-claim characteristics on the overall verification performance.
arXiv Detail & Related papers (2024-11-08T21:26:57Z) - FIRE: Fact-checking with Iterative Retrieval and Verification [63.67320352038525]
FIRE is a novel framework that integrates evidence retrieval and claim verification in an iterative manner.
It achieves slightly better performance while reducing large language model (LLM) costs by an average of 7.6 times and search costs by 16.5 times.
These results indicate that FIRE holds promise for application in large-scale fact-checking operations.
arXiv Detail & Related papers (2024-10-17T06:44:18Z) - Document-level Claim Extraction and Decontextualisation for Fact-Checking [11.994189446360433]
We propose a method for document-level claim extraction for fact-checking.
We first recast claim extraction as extractive summarization in order to identify central sentences from documents.
We then rewrite them to include necessary context from the originating document through sentence decontextualisation.
arXiv Detail & Related papers (2024-06-05T13:16:46Z) - A Closer Look at Claim Decomposition [42.07832693585166]
We investigate how various methods of claim decomposition affect the result of an evaluation approach such as the recently proposed FActScore.
We then propose an LLM-based approach to generating decompositions inspired by Bertrand Russell's theory of logical atomism and neo-Davidsonian semantics.
arXiv Detail & Related papers (2024-03-18T16:03:45Z) - FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction [85.26780391682894]
We propose Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction (FENICE)
FENICE leverages an NLI-based alignment between information in the source document and a set of atomic facts, referred to as claims, extracted from the summary.
Our metric sets a new state of the art on AGGREFACT, the de-facto benchmark for factuality evaluation.
arXiv Detail & Related papers (2024-03-04T17:57:18Z) - AFaCTA: Assisting the Annotation of Factual Claim Detection with Reliable LLM Annotators [38.523194864405326]
AFaCTA is a novel framework that assists in the annotation of factual claims.
AFaCTA calibrates its annotation confidence with consistency along three predefined reasoning paths.
Our analyses also result in PoliClaim, a comprehensive claim detection dataset spanning diverse political topics.
arXiv Detail & Related papers (2024-02-16T20:59:57Z) - AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation [57.8363998797433]
We propose AMRFact, a framework that generates perturbed summaries using Abstract Meaning Representations (AMRs)
Our approach parses factually consistent summaries into AMR graphs and injects controlled factual inconsistencies to create negative examples, allowing for coherent factually inconsistent summaries to be generated with high error-type coverage.
arXiv Detail & Related papers (2023-11-16T02:56:29Z) - From Chaos to Clarity: Claim Normalization to Empower Fact-Checking [57.024192702939736]
Claim Normalization (aka ClaimNorm) aims to decompose complex and noisy social media posts into more straightforward and understandable forms.
We propose CACN, a pioneering approach that leverages chain-of-thought and claim check-worthiness estimation.
Our experiments demonstrate that CACN outperforms several baselines across various evaluation measures.
arXiv Detail & Related papers (2023-10-22T16:07:06Z) - Interpretable Automatic Fine-grained Inconsistency Detection in Text
Summarization [56.94741578760294]
We propose the task of fine-grained inconsistency detection, the goal of which is to predict the fine-grained types of factual errors in a summary.
Motivated by how humans inspect factual inconsistency in summaries, we propose an interpretable fine-grained inconsistency detection model, FineGrainFact.
arXiv Detail & Related papers (2023-05-23T22:11:47Z) - Generating Fact Checking Explanations [52.879658637466605]
A crucial piece of the puzzle that is still missing is to understand how to automate the most elaborate part of the process.
This paper provides the first study of how these explanations can be generated automatically based on available claim context.
Our results indicate that optimising both objectives at the same time, rather than training them separately, improves the performance of a fact checking system.
arXiv Detail & Related papers (2020-04-13T05:23:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.