Assessing Effectiveness of Using Internal Signals for Check-Worthy Claim
Identification in Unlabeled Data for Automated Fact-Checking
- URL: http://arxiv.org/abs/2111.01706v1
- Date: Tue, 2 Nov 2021 16:17:20 GMT
- Title: Assessing Effectiveness of Using Internal Signals for Check-Worthy Claim
Identification in Unlabeled Data for Automated Fact-Checking
- Authors: Archita Pathak and Rohini K. Srihari
- Abstract summary: This paper explores methodology to identify check-worthy claim sentences from fake news articles.
We leverage two internal supervisory signals - headline and the abstractive summary - to rank the sentences.
We show that while the headline has more gisting similarity with how a fact-checking website writes a claim, the summary-based pipeline is the most promising for an end-to-end fact-checking system.
- Score: 6.193231258199234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While recent work on automated fact-checking has focused mainly on verifying
and explaining claims, for which the list of claims is readily available,
identifying check-worthy claim sentences from a text remains challenging.
Current claim identification models rely on manual annotations for each
sentence in the text, which is an expensive task and challenging to conduct on
a frequent basis across multiple domains. This paper explores methodology to
identify check-worthy claim sentences from fake news articles, irrespective of
domain, without explicit sentence-level annotations. We leverage two internal
supervisory signals - headline and the abstractive summary - to rank the
sentences based on semantic similarity. We hypothesize that this ranking
directly correlates to the check-worthiness of the sentences. To assess the
effectiveness of this hypothesis, we build pipelines that leverage the ranking
of sentences based on either the headline or the abstractive summary. The
top-ranked sentences are used for the downstream fact-checking tasks of
evidence retrieval and the article's veracity prediction by the pipeline. Our
findings suggest that the top 3 ranked sentences contain enough information for
evidence-based fact-checking of a fake news article. We also show that while
the headline has more gisting similarity with how a fact-checking website
writes a claim, the summary-based pipeline is the most promising for an
end-to-end fact-checking system.
Related papers
- AFaCTA: Assisting the Annotation of Factual Claim Detection with Reliable LLM Annotators [38.523194864405326]
AFaCTA is a novel framework that assists in the annotation of factual claims.
AFaCTA calibrates its annotation confidence with consistency along three predefined reasoning paths.
Our analyses also result in PoliClaim, a comprehensive claim detection dataset spanning diverse political topics.
arXiv Detail & Related papers (2024-02-16T20:59:57Z) - Give Me More Details: Improving Fact-Checking with Latent Retrieval [58.706972228039604]
Evidence plays a crucial role in automated fact-checking.
Existing fact-checking systems either assume the evidence sentences are given or use the search snippets returned by the search engine.
We propose to incorporate full text from source documents as evidence and introduce two enriched datasets.
arXiv Detail & Related papers (2023-05-25T15:01:19Z) - Interpretable Automatic Fine-grained Inconsistency Detection in Text
Summarization [56.94741578760294]
We propose the task of fine-grained inconsistency detection, the goal of which is to predict the fine-grained types of factual errors in a summary.
Motivated by how humans inspect factual inconsistency in summaries, we propose an interpretable fine-grained inconsistency detection model, FineGrainFact.
arXiv Detail & Related papers (2023-05-23T22:11:47Z) - End-to-End Page-Level Assessment of Handwritten Text Recognition [69.55992406968495]
HTR systems increasingly face the end-to-end page-level transcription of a document.
Standard metrics do not take into account the inconsistencies that might appear.
We propose a two-fold evaluation, where the transcription accuracy and the RO goodness are considered separately.
arXiv Detail & Related papers (2023-01-14T15:43:07Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - DialFact: A Benchmark for Fact-Checking in Dialogue [56.63709206232572]
We construct DialFact, a benchmark dataset of 22,245 annotated conversational claims, paired with pieces of evidence from Wikipedia.
We find that existing fact-checking models trained on non-dialogue data like FEVER fail to perform well on our task.
We propose a simple yet data-efficient solution to effectively improve fact-checking performance in dialogue.
arXiv Detail & Related papers (2021-10-15T17:34:35Z) - Assisting the Human Fact-Checkers: Detecting All Previously Fact-Checked
Claims in a Document [27.076320857009655]
Given an input document, it aims to detect all sentences that contain a claim that can be verified by some previously fact-checked claims.
The output is a re-ranked list of the document sentences, so that those that can be verified are ranked as high as possible.
Our analysis demonstrates the importance of modeling text similarity and stance, while also taking into account the veracity of the retrieved previously fact-checked claims.
arXiv Detail & Related papers (2021-09-14T13:46:52Z) - Self-Supervised Claim Identification for Automated Fact Checking [2.578242050187029]
We propose a novel, attention-based self-supervised approach to identify "claim-worthy" sentences in a fake news article.
We leverage "aboutness" of headline and content using attention mechanism for this task.
arXiv Detail & Related papers (2021-02-03T23:37:09Z) - Generating Fact Checking Explanations [52.879658637466605]
A crucial piece of the puzzle that is still missing is to understand how to automate the most elaborate part of the process.
This paper provides the first study of how these explanations can be generated automatically based on available claim context.
Our results indicate that optimising both objectives at the same time, rather than training them separately, improves the performance of a fact checking system.
arXiv Detail & Related papers (2020-04-13T05:23:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.