UIT-HSE at WNUT-2020 Task 2: Exploiting CT-BERT for Identifying COVID-19
Information on the Twitter Social Network
- URL: http://arxiv.org/abs/2009.02935v3
- Date: Fri, 13 Nov 2020 08:48:28 GMT
- Title: UIT-HSE at WNUT-2020 Task 2: Exploiting CT-BERT for Identifying COVID-19
Information on the Twitter Social Network
- Authors: Khiem Vinh Tran, Hao Phu Phan, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen
- Abstract summary: In this paper, we present our results at the W-NUT 2020 Shared Task 2: Identification of Informative COVID-19 English Tweets.
We propose our simple but effective approach using the transformer-based models based on COVID-Twitter-BERT (CT-BERT) with different fine-tuning techniques.
As a result, we achieve the F1-Score of 90.94% with the third place on the leaderboard of this task which attracted 56 submitted teams in total.
- Score: 2.7528170226206443
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recently, COVID-19 has affected a variety of real-life aspects of the world
and led to dreadful consequences. More and more tweets about COVID-19 has been
shared publicly on Twitter. However, the plurality of those Tweets are
uninformative, which is challenging to build automatic systems to detect the
informative ones for useful AI applications. In this paper, we present our
results at the W-NUT 2020 Shared Task 2: Identification of Informative COVID-19
English Tweets. In particular, we propose our simple but effective approach
using the transformer-based models based on COVID-Twitter-BERT (CT-BERT) with
different fine-tuning techniques. As a result, we achieve the F1-Score of
90.94\% with the third place on the leaderboard of this task which attracted 56
submitted teams in total.
Related papers
- ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents [49.00494558898933]
This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop.
Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety.
Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children.
We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets.
arXiv Detail & Related papers (2024-04-30T17:06:20Z) - Overview of Abusive and Threatening Language Detection in Urdu at FIRE
2021 [50.591267188664666]
We present two shared tasks of abusive and threatening language detection for the Urdu language.
We present two manually annotated datasets containing tweets labelled as (i) Abusive and Non-Abusive, and (ii) Threatening and Non-Threatening.
For both subtasks, m-Bert based transformer model showed the best performance.
arXiv Detail & Related papers (2022-07-14T07:38:13Z) - UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at
ActivityNet Challenge 2022 [69.67841335302576]
This report presents a brief description of our winning solution to the AVA Active Speaker Detection (ASD) task at ActivityNet Challenge 2022.
Our underlying model UniCon+ continues to build on our previous work, the Unified Context Network (UniCon) and Extended UniCon.
We augment the architecture with a simple GRU-based module that allows information of recurring identities to flow across scenes.
arXiv Detail & Related papers (2022-06-22T06:11:07Z) - Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal
Misinformation [83.2079454464572]
This paper describes our approach to the Image-Text Inconsistency Detection challenge of the DARPA Semantic Forensics (SemaFor) Program.
We collect Twitter-COMMs, a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles.
We train our approach, based on the state-of-the-art CLIP model, leveraging automatically generated random and hard negatives.
arXiv Detail & Related papers (2021-12-16T03:37:20Z) - Identification of Twitter Bots based on an Explainable ML Framework: the
US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data.
Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm.
Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z) - Detection of COVID-19 informative tweets using RoBERTa [5.564705758320338]
We present our work to detect informative Covid-19 English tweets using RoBERTa model as a part of the W-NUT workshop 2020.
We show the efficacy of our model on a public dataset with an F1-score of 0.89 on the validation dataset and 0.87 on the leaderboard.
arXiv Detail & Related papers (2020-10-21T18:43:13Z) - NutCracker at WNUT-2020 Task 2: Robustly Identifying Informative
COVID-19 Tweets using Ensembling and Adversarial Training [6.85316573653194]
We experiment with COVID-Twitter-BERT and RoBERTa models to identify informative COVID-19 tweets.
The ensemble of COVID-Twitter-BERT and RoBERTa obtains a F1-score of 0.9096 on the test data of WNUT-2020 Task 2 and ranks 1st on the leaderboard.
arXiv Detail & Related papers (2020-10-09T02:46:51Z) - Not-NUTs at W-NUT 2020 Task 2: A BERT-based System in Identifying
Informative COVID-19 English Tweets [0.0]
We propose a model that, given an English tweet, automatically identifies whether that tweet bears informative content regarding COVID-19 or not.
We have achieved competitive results that are only shy of those by top performing teams by roughly 1% in terms of F1 score on the informative class.
arXiv Detail & Related papers (2020-09-14T15:49:16Z) - EdinburghNLP at WNUT-2020 Task 2: Leveraging Transformers with
Generalized Augmentation for Identifying Informativeness in COVID-19 Tweets [0.0]
We present our submission for WNUT Task 2: Identification of informative COVID-19 English Tweets.
Our most successful model is an ensemble of transformers, including RoBERTa, XLNet, and BERTweet trained in a Semi-Supervised Learning (SSL) setting.
The proposed system achieves an F1 score of 0.9011 on the test set (ranking 7th on the leaderboard) and shows significant gains in performance compared to a system using FastText embeddings.
arXiv Detail & Related papers (2020-09-06T15:57:28Z) - TATL at W-NUT 2020 Task 2: A Transformer-based Baseline System for
Identification of Informative COVID-19 English Tweets [1.4315501760755605]
We present our participation in the W-NUT 2020 Shared Task 2: Identification of Informative COVID-19 English Tweets.
Inspired by the recent advances in pretrained Transformer language models, we propose a simple yet effective baseline for the task.
Despite its simplicity, our proposed approach shows very competitive results in the leaderboard.
arXiv Detail & Related papers (2020-08-28T21:27:42Z) - Privacy-Aware Recommender Systems Challenge on Twitter's Home Timeline [47.434392695347924]
RecSys 2020 Challenge organized by ACM RecSys in partnership with Twitter using this dataset.
This paper touches on the key challenges faced by researchers and professionals striving to predict user engagements.
arXiv Detail & Related papers (2020-04-28T23:54:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.