TATL at W-NUT 2020 Task 2: A Transformer-based Baseline System for
Identification of Informative COVID-19 English Tweets
- URL: http://arxiv.org/abs/2008.12854v1
- Date: Fri, 28 Aug 2020 21:27:42 GMT
- Title: TATL at W-NUT 2020 Task 2: A Transformer-based Baseline System for
Identification of Informative COVID-19 English Tweets
- Authors: Anh Tuan Nguyen
- Abstract summary: We present our participation in the W-NUT 2020 Shared Task 2: Identification of Informative COVID-19 English Tweets.
Inspired by the recent advances in pretrained Transformer language models, we propose a simple yet effective baseline for the task.
Despite its simplicity, our proposed approach shows very competitive results in the leaderboard.
- Score: 1.4315501760755605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the COVID-19 outbreak continues to spread throughout the world, more and
more information about the pandemic has been shared publicly on social media.
For example, there are a huge number of COVID-19 English Tweets daily on
Twitter. However, the majority of those Tweets are uninformative, and hence it
is important to be able to automatically select only the informative ones for
downstream applications. In this short paper, we present our participation in
the W-NUT 2020 Shared Task 2: Identification of Informative COVID-19 English
Tweets. Inspired by the recent advances in pretrained Transformer language
models, we propose a simple yet effective baseline for the task. Despite its
simplicity, our proposed approach shows very competitive results in the
leaderboard as we ranked 8 over 56 teams participated in total.
Related papers
- ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents [49.00494558898933]
This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop.
Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety.
Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children.
We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets.
arXiv Detail & Related papers (2024-04-30T17:06:20Z) - Overview of Abusive and Threatening Language Detection in Urdu at FIRE
2021 [50.591267188664666]
We present two shared tasks of abusive and threatening language detection for the Urdu language.
We present two manually annotated datasets containing tweets labelled as (i) Abusive and Non-Abusive, and (ii) Threatening and Non-Threatening.
For both subtasks, m-Bert based transformer model showed the best performance.
arXiv Detail & Related papers (2022-07-14T07:38:13Z) - UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at
ActivityNet Challenge 2022 [69.67841335302576]
This report presents a brief description of our winning solution to the AVA Active Speaker Detection (ASD) task at ActivityNet Challenge 2022.
Our underlying model UniCon+ continues to build on our previous work, the Unified Context Network (UniCon) and Extended UniCon.
We augment the architecture with a simple GRU-based module that allows information of recurring identities to flow across scenes.
arXiv Detail & Related papers (2022-06-22T06:11:07Z) - BERTuit: Understanding Spanish language in Twitter through a native
transformer [70.77033762320572]
We present bfBERTuit, the larger transformer proposed so far for Spanish language, pre-trained on a massive dataset of 230M Spanish tweets.
Our motivation is to provide a powerful resource to better understand Spanish Twitter and to be used on applications focused on this social network.
arXiv Detail & Related papers (2022-04-07T14:28:51Z) - Cross-lingual COVID-19 Fake News Detection [54.125563009333995]
We make the first attempt to detect COVID-19 misinformation in a low-resource language (Chinese) only using the fact-checked news in a high-resource language (English)
We propose a deep learning framework named CrossFake to jointly encode the cross-lingual news body texts and capture the news content.
Empirical results on our dataset demonstrate the effectiveness of CrossFake under the cross-lingual setting.
arXiv Detail & Related papers (2021-10-13T04:44:02Z) - NIT COVID-19 at WNUT-2020 Task 2: Deep Learning Model RoBERTa for
Identify Informative COVID-19 English Tweets [0.0]
This paper presents the model submitted by the NIT_COVID-19 team for identified informative COVID-19 English tweets at WNUT-2020 Task2.
The performance achieved by the proposed model for shared task WNUT 2020 Task2 is 89.14% in the F1-score metric.
arXiv Detail & Related papers (2020-11-11T05:20:39Z) - NEU at WNUT-2020 Task 2: Data Augmentation To Tell BERT That Death Is
Not Necessarily Informative [0.0]
We present a BERT classifier system for W-NUT2020 Shared Task 2: Identification of Informative COVID-19 English Tweets.
We show that BERT exploits some easy signals to identify informative tweets, and adding simple patterns to uninformative tweets drastically degrades BERT performance.
arXiv Detail & Related papers (2020-09-18T02:16:49Z) - Not-NUTs at W-NUT 2020 Task 2: A BERT-based System in Identifying
Informative COVID-19 English Tweets [0.0]
We propose a model that, given an English tweet, automatically identifies whether that tweet bears informative content regarding COVID-19 or not.
We have achieved competitive results that are only shy of those by top performing teams by roughly 1% in terms of F1 score on the informative class.
arXiv Detail & Related papers (2020-09-14T15:49:16Z) - UIT-HSE at WNUT-2020 Task 2: Exploiting CT-BERT for Identifying COVID-19
Information on the Twitter Social Network [2.7528170226206443]
In this paper, we present our results at the W-NUT 2020 Shared Task 2: Identification of Informative COVID-19 English Tweets.
We propose our simple but effective approach using the transformer-based models based on COVID-Twitter-BERT (CT-BERT) with different fine-tuning techniques.
As a result, we achieve the F1-Score of 90.94% with the third place on the leaderboard of this task which attracted 56 submitted teams in total.
arXiv Detail & Related papers (2020-09-07T08:20:31Z) - EdinburghNLP at WNUT-2020 Task 2: Leveraging Transformers with
Generalized Augmentation for Identifying Informativeness in COVID-19 Tweets [0.0]
We present our submission for WNUT Task 2: Identification of informative COVID-19 English Tweets.
Our most successful model is an ensemble of transformers, including RoBERTa, XLNet, and BERTweet trained in a Semi-Supervised Learning (SSL) setting.
The proposed system achieves an F1 score of 0.9011 on the test set (ranking 7th on the leaderboard) and shows significant gains in performance compared to a system using FastText embeddings.
arXiv Detail & Related papers (2020-09-06T15:57:28Z) - TICO-19: the Translation Initiative for Covid-19 [112.5601530395345]
The Translation Initiative for COvid-19 (TICO-19) has made test and development data available to AI and MT researchers in 35 different languages.
The same data is translated into all of the languages represented, meaning that testing or development can be done for any pairing of languages in the set.
arXiv Detail & Related papers (2020-07-03T16:26:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.