EdinburghNLP at WNUT-2020 Task 2: Leveraging Transformers with
Generalized Augmentation for Identifying Informativeness in COVID-19 Tweets
- URL: http://arxiv.org/abs/2009.06375v3
- Date: Sun, 18 Apr 2021 12:28:14 GMT
- Title: EdinburghNLP at WNUT-2020 Task 2: Leveraging Transformers with
Generalized Augmentation for Identifying Informativeness in COVID-19 Tweets
- Authors: Nickil Maveli
- Abstract summary: We present our submission for WNUT Task 2: Identification of informative COVID-19 English Tweets.
Our most successful model is an ensemble of transformers, including RoBERTa, XLNet, and BERTweet trained in a Semi-Supervised Learning (SSL) setting.
The proposed system achieves an F1 score of 0.9011 on the test set (ranking 7th on the leaderboard) and shows significant gains in performance compared to a system using FastText embeddings.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Twitter and, in general, social media has become an indispensable
communication channel in times of emergency. The ubiquitousness of smartphone
gadgets enables people to declare an emergency observed in real-time. As a
result, more agencies are interested in programmatically monitoring Twitter
(disaster relief organizations and news agencies). Therefore, recognizing the
informativeness of a Tweet can help filter noise from the large volumes of
Tweets. In this paper, we present our submission for WNUT-2020 Task 2:
Identification of informative COVID-19 English Tweets. Our most successful
model is an ensemble of transformers, including RoBERTa, XLNet, and BERTweet
trained in a Semi-Supervised Learning (SSL) setting. The proposed system
achieves an F1 score of 0.9011 on the test set (ranking 7th on the leaderboard)
and shows significant gains in performance compared to a baseline system using
FastText embeddings.
Related papers
- ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents [49.00494558898933]
This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop.
Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety.
Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children.
We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets.
arXiv Detail & Related papers (2024-04-30T17:06:20Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Overview of Abusive and Threatening Language Detection in Urdu at FIRE
2021 [50.591267188664666]
We present two shared tasks of abusive and threatening language detection for the Urdu language.
We present two manually annotated datasets containing tweets labelled as (i) Abusive and Non-Abusive, and (ii) Threatening and Non-Threatening.
For both subtasks, m-Bert based transformer model showed the best performance.
arXiv Detail & Related papers (2022-07-14T07:38:13Z) - Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal
Misinformation [83.2079454464572]
This paper describes our approach to the Image-Text Inconsistency Detection challenge of the DARPA Semantic Forensics (SemaFor) Program.
We collect Twitter-COMMs, a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles.
We train our approach, based on the state-of-the-art CLIP model, leveraging automatically generated random and hard negatives.
arXiv Detail & Related papers (2021-12-16T03:37:20Z) - Identification of Twitter Bots based on an Explainable ML Framework: the
US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data.
Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm.
Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z) - InfoMiner at WNUT-2020 Task 2: Transformer-based Covid-19 Informative
Tweet Extraction [9.710464466895521]
WNUT-2020 Task 2 was organised to recognise informative tweets from noise tweets.
In this paper, we present our approach to tackle the task objective using transformers.
arXiv Detail & Related papers (2020-10-11T19:31:18Z) - NEU at WNUT-2020 Task 2: Data Augmentation To Tell BERT That Death Is
Not Necessarily Informative [0.0]
We present a BERT classifier system for W-NUT2020 Shared Task 2: Identification of Informative COVID-19 English Tweets.
We show that BERT exploits some easy signals to identify informative tweets, and adding simple patterns to uninformative tweets drastically degrades BERT performance.
arXiv Detail & Related papers (2020-09-18T02:16:49Z) - Not-NUTs at W-NUT 2020 Task 2: A BERT-based System in Identifying
Informative COVID-19 English Tweets [0.0]
We propose a model that, given an English tweet, automatically identifies whether that tweet bears informative content regarding COVID-19 or not.
We have achieved competitive results that are only shy of those by top performing teams by roughly 1% in terms of F1 score on the informative class.
arXiv Detail & Related papers (2020-09-14T15:49:16Z) - UIT-HSE at WNUT-2020 Task 2: Exploiting CT-BERT for Identifying COVID-19
Information on the Twitter Social Network [2.7528170226206443]
In this paper, we present our results at the W-NUT 2020 Shared Task 2: Identification of Informative COVID-19 English Tweets.
We propose our simple but effective approach using the transformer-based models based on COVID-Twitter-BERT (CT-BERT) with different fine-tuning techniques.
As a result, we achieve the F1-Score of 90.94% with the third place on the leaderboard of this task which attracted 56 submitted teams in total.
arXiv Detail & Related papers (2020-09-07T08:20:31Z) - TATL at W-NUT 2020 Task 2: A Transformer-based Baseline System for
Identification of Informative COVID-19 English Tweets [1.4315501760755605]
We present our participation in the W-NUT 2020 Shared Task 2: Identification of Informative COVID-19 English Tweets.
Inspired by the recent advances in pretrained Transformer language models, we propose a simple yet effective baseline for the task.
Despite its simplicity, our proposed approach shows very competitive results in the leaderboard.
arXiv Detail & Related papers (2020-08-28T21:27:42Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.