Detection of COVID-19 informative tweets using RoBERTa
- URL: http://arxiv.org/abs/2010.11238v1
- Date: Wed, 21 Oct 2020 18:43:13 GMT
- Title: Detection of COVID-19 informative tweets using RoBERTa
- Authors: Sirigireddy Dhanalaxmi, Rohit Agarwal, Aman Sinha
- Abstract summary: We present our work to detect informative Covid-19 English tweets using RoBERTa model as a part of the W-NUT workshop 2020.
We show the efficacy of our model on a public dataset with an F1-score of 0.89 on the validation dataset and 0.87 on the leaderboard.
- Score: 5.564705758320338
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Social media such as Twitter is a hotspot of user-generated information. In
this ongoing Covid-19 pandemic, there has been an abundance of data on social
media which can be classified as informative and uninformative content. In this
paper, we present our work to detect informative Covid-19 English tweets using
RoBERTa model as a part of the W-NUT workshop 2020. We show the efficacy of our
model on a public dataset with an F1-score of 0.89 on the validation dataset
and 0.87 on the leaderboard.
Related papers
- ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents [49.00494558898933]
This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop.
Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety.
Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children.
We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets.
arXiv Detail & Related papers (2024-04-30T17:06:20Z) - BotArtist: Generic approach for bot detection in Twitter via semi-automatic machine learning pipeline [47.61306219245444]
Twitter has become a target for bots and fake accounts, resulting in the spread of false information and manipulation.
This paper introduces a semi-automatic machine learning pipeline (SAMLP) designed to address the challenges correlated with machine learning model development.
We develop a comprehensive bot detection model named BotArtist, based on user profile features.
arXiv Detail & Related papers (2023-05-31T09:12:35Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal
Misinformation [83.2079454464572]
This paper describes our approach to the Image-Text Inconsistency Detection challenge of the DARPA Semantic Forensics (SemaFor) Program.
We collect Twitter-COMMs, a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles.
We train our approach, based on the state-of-the-art CLIP model, leveraging automatically generated random and hard negatives.
arXiv Detail & Related papers (2021-12-16T03:37:20Z) - CML-COVID: A Large-Scale COVID-19 Twitter Dataset with Latent Topics,
Sentiment and Location Information [0.0]
CML-COVID is a COVID-19 Twitter data set of 19,298,967 million tweets from 5,977,653 unique individuals.
These tweets were collected between March 2020 and July 2020 using the query terms coronavirus, covid and mask related to COVID-19.
arXiv Detail & Related papers (2021-01-28T18:59:10Z) - Model Generalization on COVID-19 Fake News Detection [41.03093888315081]
We aim to achieve a robust model for the COVID-19 fake-news detection task proposed at CONSTRAINT 2021 (FakeNews-19)
We evaluate our models on two COVID-19 fake-news test sets.
arXiv Detail & Related papers (2021-01-11T12:23:41Z) - A Heuristic-driven Ensemble Framework for COVID-19 Fake News Detection [5.979726271522835]
We describe our Fake News Detection system that automatically identifies whether a tweet related to COVID-19 is "real" or "fake"
We have used an ensemble model consisting of pre-trained models that has helped us achieve a joint 8th position on the leader board.
We have been able to drastically improve our system by incorporating a novel algorithm based on username handles and link domains in tweets fetching an F1-score of 0.9883.
arXiv Detail & Related papers (2021-01-10T13:21:08Z) - Fighting an Infodemic: COVID-19 Fake News Dataset [40.418407303807456]
Fake news and rumors are rampant on social media.
To tackle this, we curate and release a manually annotated dataset of 10,700 social media posts and articles of real and fake news on COVID-19.
arXiv Detail & Related papers (2020-11-06T13:09:37Z) - NutCracker at WNUT-2020 Task 2: Robustly Identifying Informative
COVID-19 Tweets using Ensembling and Adversarial Training [6.85316573653194]
We experiment with COVID-Twitter-BERT and RoBERTa models to identify informative COVID-19 tweets.
The ensemble of COVID-Twitter-BERT and RoBERTa obtains a F1-score of 0.9096 on the test data of WNUT-2020 Task 2 and ranks 1st on the leaderboard.
arXiv Detail & Related papers (2020-10-09T02:46:51Z) - UIT-HSE at WNUT-2020 Task 2: Exploiting CT-BERT for Identifying COVID-19
Information on the Twitter Social Network [2.7528170226206443]
In this paper, we present our results at the W-NUT 2020 Shared Task 2: Identification of Informative COVID-19 English Tweets.
We propose our simple but effective approach using the transformer-based models based on COVID-Twitter-BERT (CT-BERT) with different fine-tuning techniques.
As a result, we achieve the F1-Score of 90.94% with the third place on the leaderboard of this task which attracted 56 submitted teams in total.
arXiv Detail & Related papers (2020-09-07T08:20:31Z) - COVID-19 on Social Media: Analyzing Misinformation in Twitter
Conversations [22.43295864610142]
We collected streaming data related to COVID-19 using the Twitter API, starting March 1, 2020.
We identified unreliable and misleading contents based on fact-checking sources.
We examined the narratives promoted in misinformation tweets, along with the distribution of engagements with these tweets.
arXiv Detail & Related papers (2020-03-26T09:48:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.