Related papers: Document-level Claim Extraction and Decontextualisation for Fact-Checking

Document-level Claim Extraction and Decontextualisation for Fact-Checking

URL: http://arxiv.org/abs/2406.03239v2
Date: Wed, 12 Jun 2024 11:11:48 GMT
Title: Document-level Claim Extraction and Decontextualisation for Fact-Checking
Authors: Zhenyun Deng, Michael Schlichtkrull, Andreas Vlachos,
Abstract summary: We propose a method for document-level claim extraction for fact-checking. We first recast claim extraction as extractive summarization in order to identify central sentences from documents. We then rewrite them to include necessary context from the originating document through sentence decontextualisation.
Score: 11.994189446360433
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Selecting which claims to check is a time-consuming task for human fact-checkers, especially from documents consisting of multiple sentences and containing multiple claims. However, existing claim extraction approaches focus more on identifying and extracting claims from individual sentences, e.g., identifying whether a sentence contains a claim or the exact boundaries of the claim within a sentence. In this paper, we propose a method for document-level claim extraction for fact-checking, which aims to extract check-worthy claims from documents and decontextualise them so that they can be understood out of context. Specifically, we first recast claim extraction as extractive summarization in order to identify central sentences from documents, then rewrite them to include necessary context from the originating document through sentence decontextualisation. Evaluation with both automatic metrics and a fact-checking professional shows that our method is able to extract check-worthy claims from documents more accurately than previous work, while also improving evidence retrieval.

Related papers

PRISM: Fine-Grained Paper-to-Paper Retrieval with Multi-Aspect-Aware Query Optimization [61.783280234747394]
PRISM is a document-to-document retrieval method that introduces multiple, fine-grained representations for both the query and candidate papers.<n>We present SciFullBench, a novel benchmark in which the complete and segmented context of full papers for both queries and candidates is available.<n>Experiments show that PRISM improves performance by an average of 4.3% over existing retrieval baselines.
arXiv Detail & Related papers (2025-07-14T08:41:53Z)
Towards Effective Extraction and Evaluation of Factual Claims [1.8262547855491458]
A common strategy for fact-checking long-form content generated by Large Language Models (LLMs) is extracting simple claims that can be verified independently. We propose a framework for evaluating claim extraction in the context of fact-checking along with automated, scalable, and replicable methods for applying this framework. We also introduce Claimify, an LLM-based claim extraction method, and demonstrate that it outperforms existing methods under our evaluation framework.
arXiv Detail & Related papers (2025-02-15T16:58:05Z)
Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities. Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z)
Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings. First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss. Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z)
Give Me More Details: Improving Fact-Checking with Latent Retrieval [58.706972228039604]
Evidence plays a crucial role in automated fact-checking. Existing fact-checking systems either assume the evidence sentences are given or use the search snippets returned by the search engine. We propose to incorporate full text from source documents as evidence and introduce two enriched datasets.
arXiv Detail & Related papers (2023-05-25T15:01:19Z)
Complex Claim Verification with Evidence Retrieved in the Wild [73.19998942259073]
We present the first fully automated pipeline to check real-world claims by retrieving raw evidence from the web. Our pipeline includes five components: claim decomposition, raw document retrieval, fine-grained evidence retrieval, claim-focused summarization, and veracity judgment.
arXiv Detail & Related papers (2023-05-19T17:49:19Z)
GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion. The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z)
Assessing Effectiveness of Using Internal Signals for Check-Worthy Claim Identification in Unlabeled Data for Automated Fact-Checking [6.193231258199234]
This paper explores methodology to identify check-worthy claim sentences from fake news articles. We leverage two internal supervisory signals - headline and the abstractive summary - to rank the sentences. We show that while the headline has more gisting similarity with how a fact-checking website writes a claim, the summary-based pipeline is the most promising for an end-to-end fact-checking system.
arXiv Detail & Related papers (2021-11-02T16:17:20Z)
Assisting the Human Fact-Checkers: Detecting All Previously Fact-Checked Claims in a Document [27.076320857009655]
Given an input document, it aims to detect all sentences that contain a claim that can be verified by some previously fact-checked claims. The output is a re-ranked list of the document sentences, so that those that can be verified are ranked as high as possible. Our analysis demonstrates the importance of modeling text similarity and stance, while also taking into account the veracity of the retrieved previously fact-checked claims.
arXiv Detail & Related papers (2021-09-14T13:46:52Z)
Graph-based Retrieval for Claim Verification over Cross-Document Evidence [0.6853165736531939]
We conjecture that a graph-based approach can be beneficial to identify fragmented evidence. We tested this hypothesis by building, over the whole corpus, a large graph that interconnects text portions by means of mentioned entities. Our experiments show that leveraging on a graph structure is beneficial in identifying a reasonably small portion of passages related to a claim.
arXiv Detail & Related papers (2021-09-13T14:54:26Z)
Generating Fact Checking Summaries for Web Claims [8.980876474818153]
We present a neural attention-based approach that learns to establish the correctness of textual claims based on evidence in the form of text documents. We show the efficacy of our approach on datasets concerning political, healthcare, and environmental issues.
arXiv Detail & Related papers (2020-10-16T18:10:47Z)
Multilevel Text Alignment with Cross-Document Attention [59.76351805607481]
Existing alignment methods operate at a single, predefined level. We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component.
arXiv Detail & Related papers (2020-10-03T02:52:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.