Empowering the Fact-checkers! Automatic Identification of Claim Spans on
Twitter
- URL: http://arxiv.org/abs/2210.04710v2
- Date: Tue, 11 Oct 2022 12:00:06 GMT
- Title: Empowering the Fact-checkers! Automatic Identification of Claim Spans on
Twitter
- Authors: Megha Sundriyal, Atharva Kulkarni, Vaibhav Pulastya, Md Shad Akhtar,
Tanmoy Chakraborty
- Abstract summary: Claim Span Identification (CSI) is a tool to automatically identify and extract the snippets of claim-worthy (mis)information present in a post.
We propose CURT, a large-scale Twitter corpus with token-level claim spans on more than 7.5k tweets.
We benchmark our dataset with DABERTa, an adapter-based variation of RoBERTa.
- Score: 25.944789217337338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The widespread diffusion of medical and political claims in the wake of
COVID-19 has led to a voluminous rise in misinformation and fake news. The
current vogue is to employ manual fact-checkers to efficiently classify and
verify such data to combat this avalanche of claim-ridden misinformation.
However, the rate of information dissemination is such that it vastly outpaces
the fact-checkers' strength. Therefore, to aid manual fact-checkers in
eliminating the superfluous content, it becomes imperative to automatically
identify and extract the snippets of claim-worthy (mis)information present in a
post. In this work, we introduce the novel task of Claim Span Identification
(CSI). We propose CURT, a large-scale Twitter corpus with token-level claim
spans on more than 7.5k tweets. Furthermore, along with the standard token
classification baselines, we benchmark our dataset with DABERTa, an
adapter-based variation of RoBERTa. The experimental results attest that
DABERTa outperforms the baseline systems across several evaluation metrics,
improving by about 1.5 points. We also report detailed error analysis to
validate the model's performance along with the ablation studies. Lastly, we
release our comprehensive span annotation guidelines for public use.
Related papers
- Contrastive Learning to Improve Retrieval for Real-world Fact Checking [84.57583869042791]
We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for fact-checking complex claims.
We leverage the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents.
We find a 6% improvement in veracity classification accuracy on the dataset.
arXiv Detail & Related papers (2024-10-07T00:09:50Z) - Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain.
We propose an adversarial algorithm to make the retriever component robust against distribution shift.
We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z) - From Chaos to Clarity: Claim Normalization to Empower Fact-Checking [57.024192702939736]
Claim Normalization (aka ClaimNorm) aims to decompose complex and noisy social media posts into more straightforward and understandable forms.
We propose CACN, a pioneering approach that leverages chain-of-thought and claim check-worthiness estimation.
Our experiments demonstrate that CACN outperforms several baselines across various evaluation measures.
arXiv Detail & Related papers (2023-10-22T16:07:06Z) - PANACEA: An Automated Misinformation Detection System on COVID-19 [49.83321665982157]
PANACEA is a web-based misinformation detection system on COVID-19 related claims.
It has two modules, fact-checking and rumour detection.
arXiv Detail & Related papers (2023-02-28T21:53:48Z) - Machine Learning-based Automatic Annotation and Detection of COVID-19
Fake News [8.020736472947581]
COVID-19 impacted every part of the world, although the misinformation about the outbreak traveled faster than the virus.
Existing work neglects the presence of bots that act as a catalyst in the spread.
We propose an automated approach for labeling data using verified fact-checked statements on a Twitter dataset.
arXiv Detail & Related papers (2022-09-07T13:55:59Z) - Assessing Effectiveness of Using Internal Signals for Check-Worthy Claim
Identification in Unlabeled Data for Automated Fact-Checking [6.193231258199234]
This paper explores methodology to identify check-worthy claim sentences from fake news articles.
We leverage two internal supervisory signals - headline and the abstractive summary - to rank the sentences.
We show that while the headline has more gisting similarity with how a fact-checking website writes a claim, the summary-based pipeline is the most promising for an end-to-end fact-checking system.
arXiv Detail & Related papers (2021-11-02T16:17:20Z) - FacTeR-Check: Semi-automated fact-checking through Semantic Similarity
and Natural Language Inference [61.068947982746224]
FacTeR-Check enables retrieving fact-checked information, unchecked claims verification and tracking dangerous information over social media.
The architecture is validated using a new dataset called NLI19-SP that is publicly released with COVID-19 related hoaxes and tweets from Spanish social media.
Our results show state-of-the-art performance on the individual benchmarks, as well as producing useful analysis of the evolution over time of 61 different hoaxes.
arXiv Detail & Related papers (2021-10-27T15:44:54Z) - Zero-shot Fact Verification by Claim Generation [85.27523983027471]
We develop QACG, a framework for training a robust fact verification model.
We use automatically generated claims that can be supported, refuted, or unverifiable from evidence from Wikipedia.
In a zero-shot scenario, QACG improves a RoBERTa model's F1 from 50% to 77%, equivalent in performance to 2K+ manually-curated examples.
arXiv Detail & Related papers (2021-05-31T03:13:52Z) - Self-Supervised Claim Identification for Automated Fact Checking [2.578242050187029]
We propose a novel, attention-based self-supervised approach to identify "claim-worthy" sentences in a fake news article.
We leverage "aboutness" of headline and content using attention mechanism for this task.
arXiv Detail & Related papers (2021-02-03T23:37:09Z) - ArCOV19-Rumors: Arabic COVID-19 Twitter Dataset for Misinformation
Detection [6.688963029270579]
ArCOV19-Rumors is an Arabic COVID-19 Twitter dataset for misinformation detection composed of tweets containing claims from 27th January till the end of April 2020.
We collected 138 verified claims, mostly from popular fact-checking websites, and identified 9.4K relevant tweets to those claims.
Tweets were manually-annotated by veracity to support research on misinformation detection, which is one of the major problems faced during a pandemic.
arXiv Detail & Related papers (2020-10-17T11:21:40Z) - Too Many Claims to Fact-Check: Prioritizing Political Claims Based on
Check-Worthiness [1.2891210250935146]
We propose a model prioritizing the claims based on their check-worthiness.
We use BERT model with additional features including domain-specific controversial topics, word embeddings, and others.
arXiv Detail & Related papers (2020-04-17T10:55:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.