Related papers: Exploratory Analysis of Covid-19 Tweets using Topic Modeling, UMAP, and DiGraphs

Exploratory Analysis of Covid-19 Tweets using Topic Modeling, UMAP, and DiGraphs

URL: http://arxiv.org/abs/2005.03082v1
Date: Wed, 6 May 2020 19:16:38 GMT
Title: Exploratory Analysis of Covid-19 Tweets using Topic Modeling, UMAP, and DiGraphs
Authors: Catherine Ordun, Sanjay Purushotham, Edward Raff
Abstract summary: This paper illustrates five different techniques to assess the distinctiveness of topics, key terms and features, speed of information dissemination, and network behaviors for Covid19 tweets. One topic specific to U.S. cases would start to uptick immediately after live White House Coronavirus Task Force briefings. One of the simplest highlights of this analysis is that early-stage descriptive methods like regular expressions can successfully identify high-level themes.
Score: 36.33347149799959
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper illustrates five different techniques to assess the distinctiveness of topics, key terms and features, speed of information dissemination, and network behaviors for Covid19 tweets. First, we use pattern matching and second, topic modeling through Latent Dirichlet Allocation (LDA) to generate twenty different topics that discuss case spread, healthcare workers, and personal protective equipment (PPE). One topic specific to U.S. cases would start to uptick immediately after live White House Coronavirus Task Force briefings, implying that many Twitter users are paying attention to government announcements. We contribute machine learning methods not previously reported in the Covid19 Twitter literature. This includes our third method, Uniform Manifold Approximation and Projection (UMAP), that identifies unique clustering-behavior of distinct topics to improve our understanding of important themes in the corpus and help assess the quality of generated topics. Fourth, we calculated retweeting times to understand how fast information about Covid19 propagates on Twitter. Our analysis indicates that the median retweeting time of Covid19 for a sample corpus in March 2020 was 2.87 hours, approximately 50 minutes faster than repostings from Chinese social media about H7N9 in March 2013. Lastly, we sought to understand retweet cascades, by visualizing the connections of users over time from fast to slow retweeting. As the time to retweet increases, the density of connections also increase where in our sample, we found distinct users dominating the attention of Covid19 retweeters. One of the simplest highlights of this analysis is that early-stage descriptive methods like regular expressions can successfully identify high-level themes which were consistently verified as important through every subsequent analysis.

Related papers

ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information. To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles. Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z)
Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal Misinformation [83.2079454464572]
This paper describes our approach to the Image-Text Inconsistency Detection challenge of the DARPA Semantic Forensics (SemaFor) Program. We collect Twitter-COMMs, a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles. We train our approach, based on the state-of-the-art CLIP model, leveraging automatically generated random and hard negatives.
arXiv Detail & Related papers (2021-12-16T03:37:20Z)
Identification of Twitter Bots based on an Explainable ML Framework: the US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data. Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm. Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z)
Extracting Major Topics of COVID-19 Related Tweets [2.867517731896504]
We use the topic modeling method to extract global topics during the nationwide quarantine periods (March 23 to June 23, 2020) on Covid-19 tweets. We additionally analyze temporal trends of the topics for the whole world and four countries.
arXiv Detail & Related papers (2021-10-05T08:40:51Z)
Misleading the Covid-19 vaccination discourse on Twitter: An exploratory study of infodemic around the pandemic [0.45593531937154413]
We collect a moderate-sized representative corpus of tweets (200,000 approx.) pertaining to Covid-19 vaccination over a period of seven months (September 2020 - March 2021) Following a Transfer Learning approach, we utilize the pre-trained Transformer-based XLNet model to classify tweets as Misleading or Non-Misleading. We build on this to study and contrast the characteristics of tweets in the corpus that are misleading in nature against non-misleading ones. Several ML models are employed for prediction, with up to 90% accuracy, and the importance of each feature is explained using SHAP Explainable AI (X
arXiv Detail & Related papers (2021-08-16T17:02:18Z)
Sentiment analysis in tweets: an assessment study from classical to modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information. Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks. This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z)
Understanding Information Spreading Mechanisms During COVID-19 Pandemic by Analyzing the Impact of Tweet Text and User Features for Retweet Prediction [6.658785818853953]
COVID-19 has affected the world economy and the daily life routine of almost everyone. Social media platforms enable users to share information with other users who can reshare this information. We propose two CNN and RNN based models and evaluate the performance of these models on a publicly available TweetsCOV19 dataset.
arXiv Detail & Related papers (2021-05-26T15:55:58Z)
Understanding the Spatio-temporal Topic Dynamics of Covid-19 using Nonnegative Tensor Factorization: A Case Study [1.6328866317851185]
This paper proposes a representation of social media data and Non-negative Factorization (NTF) to identify the topics discussed in social media data. A case study on the Australia Twittersphere is presented to identify visualize the topic dynamics on and off the Covid-19.
arXiv Detail & Related papers (2020-09-19T15:16:28Z)
Covid-Transformer: Detecting COVID-19 Trending Topics on Twitter Using Universal Sentence Encoder [7.305019142196582]
corona-virus disease (also known as COVID-19) has led to a pandemic, impacting more than 200 countries across the globe. With its global impact, COVID-19 has become a major concern of people almost everywhere. We try to analyze the tweets and detect the trending topics and major concerns of people on Twitter.
arXiv Detail & Related papers (2020-09-08T19:00:38Z)
EmotiCon: Context-Aware Multimodal Emotion Recognition using Frege's Principle [71.47160118286226]
We present EmotiCon, a learning-based algorithm for context-aware perceived human emotion recognition from videos and images. Motivated by Frege's Context Principle from psychology, our approach combines three interpretations of context for emotion recognition. We report an Average Precision (AP) score of 35.48 across 26 classes, which is an improvement of 7-8 over prior methods.
arXiv Detail & Related papers (2020-03-14T19:55:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.