Claim Check-Worthiness Detection as Positive Unlabelled Learning
- URL: http://arxiv.org/abs/2003.02736v2
- Date: Wed, 16 Sep 2020 16:52:15 GMT
- Title: Claim Check-Worthiness Detection as Positive Unlabelled Learning
- Authors: Dustin Wright and Isabelle Augenstein
- Abstract summary: Claim check-worthiness detection is a critical component of fact checking systems.
We illuminate a central challenge in claim check-worthiness detection underlying all of these tasks.
Our best performing method is a unified approach which automatically corrects for this using a variant of positive unlabelled learning.
- Score: 53.24606510691877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the first step of automatic fact checking, claim check-worthiness
detection is a critical component of fact checking systems. There are multiple
lines of research which study this problem: check-worthiness ranking from
political speeches and debates, rumour detection on Twitter, and citation
needed detection from Wikipedia. To date, there has been no structured
comparison of these various tasks to understand their relatedness, and no
investigation into whether or not a unified approach to all of them is
achievable. In this work, we illuminate a central challenge in claim
check-worthiness detection underlying all of these tasks, being that they hinge
upon detecting both how factual a sentence is, as well as how likely a sentence
is to be believed without verification. As such, annotators only mark those
instances they judge to be clear-cut check-worthy. Our best performing method
is a unified approach which automatically corrects for this using a variant of
positive unlabelled learning that finds instances which were incorrectly
labelled as not check-worthy. In applying this, we out-perform the state of the
art in two of the three tasks studied for claim check-worthiness detection in
English.
Related papers
- The Decisive Power of Indecision: Low-Variance Risk-Limiting Audits and Election Contestation via Marginal Mark Recording [51.82772358241505]
Risk-limiting audits (RLAs) are techniques for verifying the outcomes of large elections.
We define new families of audits that improve efficiency and offer advances in statistical power.
New audits are enabled by revisiting the standard notion of a cast-vote record so that it can declare multiple possible mark interpretations.
arXiv Detail & Related papers (2024-02-09T16:23:54Z) - Leveraging Social Discourse to Measure Check-worthiness of Claims for
Fact-checking [36.21314290592325]
We present CheckIt, a manually annotated large Twitter dataset for fine-grained claim check-worthiness.
We benchmark our dataset against a unified approach, CheckMate, that jointly determines whether a claim is check-worthy and the factors that led to that conclusion.
arXiv Detail & Related papers (2023-09-17T13:42:41Z) - Give Me More Details: Improving Fact-Checking with Latent Retrieval [58.706972228039604]
Evidence plays a crucial role in automated fact-checking.
Existing fact-checking systems either assume the evidence sentences are given or use the search snippets returned by the search engine.
We propose to incorporate full text from source documents as evidence and introduce two enriched datasets.
arXiv Detail & Related papers (2023-05-25T15:01:19Z) - Check-worthy Claim Detection across Topics for Automated Fact-checking [21.723689314962233]
We assess and quantify the challenge of detecting check-worthy claims for new, unseen topics.
We propose the AraCWA model to mitigate the performance deterioration when detecting check-worthy claims across topics.
arXiv Detail & Related papers (2022-12-16T14:54:56Z) - Assessing Effectiveness of Using Internal Signals for Check-Worthy Claim
Identification in Unlabeled Data for Automated Fact-Checking [6.193231258199234]
This paper explores methodology to identify check-worthy claim sentences from fake news articles.
We leverage two internal supervisory signals - headline and the abstractive summary - to rank the sentences.
We show that while the headline has more gisting similarity with how a fact-checking website writes a claim, the summary-based pipeline is the most promising for an end-to-end fact-checking system.
arXiv Detail & Related papers (2021-11-02T16:17:20Z) - DialFact: A Benchmark for Fact-Checking in Dialogue [56.63709206232572]
We construct DialFact, a benchmark dataset of 22,245 annotated conversational claims, paired with pieces of evidence from Wikipedia.
We find that existing fact-checking models trained on non-dialogue data like FEVER fail to perform well on our task.
We propose a simple yet data-efficient solution to effectively improve fact-checking performance in dialogue.
arXiv Detail & Related papers (2021-10-15T17:34:35Z) - UPV at CheckThat! 2021: Mitigating Cultural Differences for Identifying
Multilingual Check-worthy Claims [6.167830237917659]
In this paper, we propose a language identification task as an auxiliary task to mitigate unintended bias.
Our results show that joint training of language identification and check-worthy claim detection tasks can provide performance gains for some of the selected languages.
arXiv Detail & Related papers (2021-09-19T21:46:16Z) - Assisting the Human Fact-Checkers: Detecting All Previously Fact-Checked
Claims in a Document [27.076320857009655]
Given an input document, it aims to detect all sentences that contain a claim that can be verified by some previously fact-checked claims.
The output is a re-ranked list of the document sentences, so that those that can be verified are ranked as high as possible.
Our analysis demonstrates the importance of modeling text similarity and stance, while also taking into account the veracity of the retrieved previously fact-checked claims.
arXiv Detail & Related papers (2021-09-14T13:46:52Z) - Detection as Regression: Certified Object Detection by Median Smoothing [50.89591634725045]
This work is motivated by recent progress on certified classification by randomized smoothing.
We obtain the first model-agnostic, training-free, and certified defense for object detection against $ell$-bounded attacks.
arXiv Detail & Related papers (2020-07-07T18:40:19Z) - Generating Fact Checking Explanations [52.879658637466605]
A crucial piece of the puzzle that is still missing is to understand how to automate the most elaborate part of the process.
This paper provides the first study of how these explanations can be generated automatically based on available claim context.
Our results indicate that optimising both objectives at the same time, rather than training them separately, improves the performance of a fact checking system.
arXiv Detail & Related papers (2020-04-13T05:23:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.