NEU at WNUT-2020 Task 2: Data Augmentation To Tell BERT That Death Is
Not Necessarily Informative
- URL: http://arxiv.org/abs/2009.08590v1
- Date: Fri, 18 Sep 2020 02:16:49 GMT
- Title: NEU at WNUT-2020 Task 2: Data Augmentation To Tell BERT That Death Is
Not Necessarily Informative
- Authors: Kumud Chauhan
- Abstract summary: We present a BERT classifier system for W-NUT2020 Shared Task 2: Identification of Informative COVID-19 English Tweets.
We show that BERT exploits some easy signals to identify informative tweets, and adding simple patterns to uninformative tweets drastically degrades BERT performance.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Millions of people around the world are sharing COVID-19 related information
on social media platforms. Since not all the information shared on the social
media is useful, a machine learning system to identify informative posts can
help users in finding relevant information. In this paper, we present a BERT
classifier system for W-NUT2020 Shared Task 2: Identification of Informative
COVID-19 English Tweets. Further, we show that BERT exploits some easy signals
to identify informative tweets, and adding simple patterns to uninformative
tweets drastically degrades BERT performance. In particular, simply adding 10
deaths to tweets in dev set, reduces BERT F1- score from 92.63 to 7.28. We also
propose a simple data augmentation technique that helps in improving the
robustness and generalization ability of the BERT classifier.
Related papers
- ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents [49.00494558898933]
This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop.
Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety.
Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children.
We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets.
arXiv Detail & Related papers (2024-04-30T17:06:20Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Learning Instructions with Unlabeled Data for Zero-Shot Cross-Task
Generalization [68.91386402390403]
We propose Unlabeled Data Augmented Instruction Tuning (UDIT) to take better advantage of the instructions during instruction learning.
We conduct extensive experiments to show UDIT's effectiveness in various scenarios of tasks and datasets.
arXiv Detail & Related papers (2022-10-17T15:25:24Z) - Overview of Abusive and Threatening Language Detection in Urdu at FIRE
2021 [50.591267188664666]
We present two shared tasks of abusive and threatening language detection for the Urdu language.
We present two manually annotated datasets containing tweets labelled as (i) Abusive and Non-Abusive, and (ii) Threatening and Non-Threatening.
For both subtasks, m-Bert based transformer model showed the best performance.
arXiv Detail & Related papers (2022-07-14T07:38:13Z) - Dartmouth CS at WNUT-2020 Task 2: Informative COVID-19 Tweet
Classification Using BERT [2.1574781022415364]
We describe the systems developed for the WNUT-2020 shared task 2, identification of informative COVID-19 English Tweets.
BERT is a highly performant model for Natural Language Processing tasks.
We increased BERT's performance in this classification task by fine-tuning BERT and concatenating its embeddings with Tweet-specific features.
arXiv Detail & Related papers (2020-12-07T07:55:31Z) - Not-NUTs at W-NUT 2020 Task 2: A BERT-based System in Identifying
Informative COVID-19 English Tweets [0.0]
We propose a model that, given an English tweet, automatically identifies whether that tweet bears informative content regarding COVID-19 or not.
We have achieved competitive results that are only shy of those by top performing teams by roughly 1% in terms of F1 score on the informative class.
arXiv Detail & Related papers (2020-09-14T15:49:16Z) - UIT-HSE at WNUT-2020 Task 2: Exploiting CT-BERT for Identifying COVID-19
Information on the Twitter Social Network [2.7528170226206443]
In this paper, we present our results at the W-NUT 2020 Shared Task 2: Identification of Informative COVID-19 English Tweets.
We propose our simple but effective approach using the transformer-based models based on COVID-Twitter-BERT (CT-BERT) with different fine-tuning techniques.
As a result, we achieve the F1-Score of 90.94% with the third place on the leaderboard of this task which attracted 56 submitted teams in total.
arXiv Detail & Related papers (2020-09-07T08:20:31Z) - EdinburghNLP at WNUT-2020 Task 2: Leveraging Transformers with
Generalized Augmentation for Identifying Informativeness in COVID-19 Tweets [0.0]
We present our submission for WNUT Task 2: Identification of informative COVID-19 English Tweets.
Our most successful model is an ensemble of transformers, including RoBERTa, XLNet, and BERTweet trained in a Semi-Supervised Learning (SSL) setting.
The proposed system achieves an F1 score of 0.9011 on the test set (ranking 7th on the leaderboard) and shows significant gains in performance compared to a system using FastText embeddings.
arXiv Detail & Related papers (2020-09-06T15:57:28Z) - TATL at W-NUT 2020 Task 2: A Transformer-based Baseline System for
Identification of Informative COVID-19 English Tweets [1.4315501760755605]
We present our participation in the W-NUT 2020 Shared Task 2: Identification of Informative COVID-19 English Tweets.
Inspired by the recent advances in pretrained Transformer language models, we propose a simple yet effective baseline for the task.
Despite its simplicity, our proposed approach shows very competitive results in the leaderboard.
arXiv Detail & Related papers (2020-08-28T21:27:42Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z) - Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation [84.64004917951547]
Fine-tuning pre-trained language models like BERT has become an effective way in NLP.
In this paper, we improve the fine-tuning of BERT with two effective mechanisms: self-ensemble and self-distillation.
arXiv Detail & Related papers (2020-02-24T16:17:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.