Team Alex at CLEF CheckThat! 2020: Identifying Check-Worthy Tweets With
Transformer Models
- URL: http://arxiv.org/abs/2009.02931v1
- Date: Mon, 7 Sep 2020 08:03:21 GMT
- Title: Team Alex at CLEF CheckThat! 2020: Identifying Check-Worthy Tweets With
Transformer Models
- Authors: Alex Nikolov, Giovanni Da San Martino, Ivan Koychev, and Preslav Nakov
- Abstract summary: We propose a model for detecting check-worthy tweets about COVID-19, which combines deep contextualized text representations with modeling the social context of the tweet.
Our official submission to the English version of CLEF-2020 CheckThat! Task 1, system Team_Alex, was ranked second with a MAP score of 0.8034.
- Score: 28.25006244616817
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While misinformation and disinformation have been thriving in social media
for years, with the emergence of the COVID-19 pandemic, the political and the
health misinformation merged, thus elevating the problem to a whole new level
and giving rise to the first global infodemic. The fight against this infodemic
has many aspects, with fact-checking and debunking false and misleading claims
being among the most important ones. Unfortunately, manual fact-checking is
time-consuming and automatic fact-checking is resource-intense, which means
that we need to pre-filter the input social media posts and to throw out those
that do not appear to be check-worthy. With this in mind, here we propose a
model for detecting check-worthy tweets about COVID-19, which combines deep
contextualized text representations with modeling the social context of the
tweet. We further describe a number of additional experiments and comparisons,
which we believe should be useful for future research as they provide some
indication about what techniques are effective for the task. Our official
submission to the English version of CLEF-2020 CheckThat! Task 1, system
Team_Alex, was ranked second with a MAP score of 0.8034, which is almost tied
with the wining system, lagging behind by just 0.003 MAP points absolute.
Related papers
- ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents [49.00494558898933]
This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop.
Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety.
Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children.
We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets.
arXiv Detail & Related papers (2024-04-30T17:06:20Z) - The Earth is Flat? Unveiling Factual Errors in Large Language Models [89.94270049334479]
Large Language Models (LLMs) like ChatGPT are in various applications due to their extensive knowledge from pre-training and fine-tuning.
Despite this, they are prone to generating factual and commonsense errors, raising concerns in critical areas like healthcare, journalism, and education.
We introduce a novel, automatic testing framework, FactChecker, aimed at uncovering factual inaccuracies in LLMs.
arXiv Detail & Related papers (2024-01-01T14:02:27Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - CrowdChecked: Detecting Previously Fact-Checked Claims in Social Media [19.688259030184508]
We propose an end-to-end framework to learn from noisy data based on modified self-adaptive training.
Our experiments on the CLEF'21 CheckThat! test set show improvements over the state of the art by two points absolute.
arXiv Detail & Related papers (2022-10-10T06:05:52Z) - Rumor Detection with Self-supervised Learning on Texts and Social Graph [101.94546286960642]
We propose contrastive self-supervised learning on heterogeneous information sources, so as to reveal their relations and characterize rumors better.
We term this framework as Self-supervised Rumor Detection (SRD)
Extensive experiments on three real-world datasets validate the effectiveness of SRD for automatic rumor detection on social media.
arXiv Detail & Related papers (2022-04-19T12:10:03Z) - Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal
Misinformation [83.2079454464572]
This paper describes our approach to the Image-Text Inconsistency Detection challenge of the DARPA Semantic Forensics (SemaFor) Program.
We collect Twitter-COMMs, a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles.
We train our approach, based on the state-of-the-art CLIP model, leveraging automatically generated random and hard negatives.
arXiv Detail & Related papers (2021-12-16T03:37:20Z) - Findings of the NLP4IF-2021 Shared Tasks on Fighting the COVID-19
Infodemic and Censorship Detection [23.280506220186425]
We present the results of the NLP4IF-2021 shared tasks.
Ten teams submitted systems for task 1, and one team participated in task 2.
The best systems used pre-trained Transformers and ensembles.
arXiv Detail & Related papers (2021-09-23T06:38:03Z) - Misleading the Covid-19 vaccination discourse on Twitter: An exploratory
study of infodemic around the pandemic [0.45593531937154413]
We collect a moderate-sized representative corpus of tweets (200,000 approx.) pertaining to Covid-19 vaccination over a period of seven months (September 2020 - March 2021)
Following a Transfer Learning approach, we utilize the pre-trained Transformer-based XLNet model to classify tweets as Misleading or Non-Misleading.
We build on this to study and contrast the characteristics of tweets in the corpus that are misleading in nature against non-misleading ones.
Several ML models are employed for prediction, with up to 90% accuracy, and the importance of each feature is explained using SHAP Explainable AI (X
arXiv Detail & Related papers (2021-08-16T17:02:18Z) - g2tmn at Constraint@AAAI2021: Exploiting CT-BERT and Ensembling Learning
for COVID-19 Fake News Detection [0.0]
We present our results at the Constraint@AAAI2021 Shared Task: COVID-19 Fake News Detection in English.
We propose our approach using the transformer-based ensemble of COVID-Twitter-BERT (CT-BERT) models.
As a result, our best model achieved the weighted F1-score of 98.69 on the test set.
arXiv Detail & Related papers (2020-12-22T12:43:12Z) - Not-NUTs at W-NUT 2020 Task 2: A BERT-based System in Identifying
Informative COVID-19 English Tweets [0.0]
We propose a model that, given an English tweet, automatically identifies whether that tweet bears informative content regarding COVID-19 or not.
We have achieved competitive results that are only shy of those by top performing teams by roughly 1% in terms of F1 score on the informative class.
arXiv Detail & Related papers (2020-09-14T15:49:16Z) - Misinformation Has High Perplexity [55.47422012881148]
We propose to leverage the perplexity to debunk false claims in an unsupervised manner.
First, we extract reliable evidence from scientific and news sources according to sentence similarity to the claims.
Second, we prime a language model with the extracted evidence and finally evaluate the correctness of given claims based on the perplexity scores at debunking time.
arXiv Detail & Related papers (2020-06-08T15:13:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.