CML-COVID: A Large-Scale COVID-19 Twitter Dataset with Latent Topics,
Sentiment and Location Information
- URL: http://arxiv.org/abs/2101.12202v1
- Date: Thu, 28 Jan 2021 18:59:10 GMT
- Title: CML-COVID: A Large-Scale COVID-19 Twitter Dataset with Latent Topics,
Sentiment and Location Information
- Authors: Hassan Dashtian, Dhiraj Murthy
- Abstract summary: CML-COVID is a COVID-19 Twitter data set of 19,298,967 million tweets from 5,977,653 unique individuals.
These tweets were collected between March 2020 and July 2020 using the query terms coronavirus, covid and mask related to COVID-19.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: As a platform, Twitter has been a significant public space for discussion
related to the COVID-19 pandemic. Public social media platforms such as Twitter
represent important sites of engagement regarding the pandemic and these data
can be used by research teams for social, health, and other research.
Understanding public opinion about COVID-19 and how information diffuses in
social media is important for governments and research institutions. Twitter is
a ubiquitous public platform and, as such, has tremendous utility for
understanding public perceptions, behavior, and attitudes related to COVID-19.
In this research, we present CML-COVID, a COVID-19 Twitter data set of
19,298,967 million tweets from 5,977,653 unique individuals and summarize some
of the attributes of these data. These tweets were collected between March 2020
and July 2020 using the query terms coronavirus, covid and mask related to
COVID-19. We use topic modeling, sentiment analysis, and descriptive statistics
to describe the tweets related to COVID-19 we collected and the geographical
location of tweets, where available. We provide information on how to access
our tweet dataset (archived using twarc).
Related papers
- Just Another Day on Twitter: A Complete 24 Hours of Twitter Data [39.98744837726886]
We have collected all 375 million tweets published within a 24-hour time period starting on September 21, 2022.
This is the first complete 24-hour Twitter dataset that is available for the research community.
arXiv Detail & Related papers (2023-01-26T21:28:40Z) - METS-CoV: A Dataset of Medical Entity and Targeted Sentiment on COVID-19
Related Tweets [13.35986397208115]
This paper releases METS-CoV, a dataset containing medical entities and targeted sentiments from COVID-19-related tweets.
To the best of our knowledge, METS-CoV is the first dataset to collect medical entities and corresponding sentiments of COVID-19-related tweets.
arXiv Detail & Related papers (2022-09-28T01:55:14Z) - "COVID-19 was a FIFA conspiracy #curropt": An Investigation into the
Viral Spread of COVID-19 Misinformation [60.268682953952506]
We estimate the extent to which misinformation has influenced the course of the COVID-19 pandemic using natural language processing models.
We provide a strategy to combat social media posts that are likely to cause widespread harm.
arXiv Detail & Related papers (2022-06-12T19:41:01Z) - Twitter Dataset on the Russo-Ukrainian War [68.713984286035]
We have initiated an ongoing dataset acquisition from Twitter API.
The dataset has reached the amount of 57.3 million tweets, originating from 7.7 million users.
We apply an initial volume and sentiment analysis, while the dataset can be used to further exploratory investigation towards topic analysis, hate speech, propaganda recognition, or even show potential malicious entities like botnets.
arXiv Detail & Related papers (2022-04-07T12:33:06Z) - Extracting Feelings of People Regarding COVID-19 by Social Network
Mining [0.0]
dataset of COVID-related tweets in English language is collected.
More than two million tweets from March 23 to June 23 of 2020 are analyzed.
arXiv Detail & Related papers (2021-10-12T16:45:33Z) - ArCOV19-Rumors: Arabic COVID-19 Twitter Dataset for Misinformation
Detection [6.688963029270579]
ArCOV19-Rumors is an Arabic COVID-19 Twitter dataset for misinformation detection composed of tweets containing claims from 27th January till the end of April 2020.
We collected 138 verified claims, mostly from popular fact-checking websites, and identified 9.4K relevant tweets to those claims.
Tweets were manually-annotated by veracity to support research on misinformation detection, which is one of the major problems faced during a pandemic.
arXiv Detail & Related papers (2020-10-17T11:21:40Z) - Understanding the Hoarding Behaviors during the COVID-19 Pandemic using
Large Scale Social Media Data [77.34726150561087]
We analyze the hoarding and anti-hoarding patterns of over 42,000 unique Twitter users in the United States from March 1 to April 30, 2020.
We find the percentage of females in both hoarding and anti-hoarding groups is higher than that of the general Twitter user population.
The LIWC anxiety mean for the hoarding-related tweets is significantly higher than the baseline Twitter anxiety mean.
arXiv Detail & Related papers (2020-10-15T16:02:25Z) - Privacy-Aware Recommender Systems Challenge on Twitter's Home Timeline [47.434392695347924]
RecSys 2020 Challenge organized by ACM RecSys in partnership with Twitter using this dataset.
This paper touches on the key challenges faced by researchers and professionals striving to predict user engagements.
arXiv Detail & Related papers (2020-04-28T23:54:33Z) - The Ivory Tower Lost: How College Students Respond Differently than the
General Public to the COVID-19 Pandemic [66.80677233314002]
Pandemic of the novel Coronavirus Disease 2019 (COVID-19) has presented governments with ultimate challenges.
In the United States, the country with the highest confirmed COVID-19 infection cases, a nationwide social distancing protocol has been implemented by the President.
This paper aims to discover the social implications of this unprecedented disruption in our interactive society by mining people's opinions on social media.
arXiv Detail & Related papers (2020-04-21T13:02:38Z) - ArCOV-19: The First Arabic COVID-19 Twitter Dataset with Propagation
Networks [6.688963029270579]
ArCOV-19 is the first publicly-available Arabic Twitter dataset covering COVID-19 pandemic.
It includes about 2.7M tweets alongside the propagation networks of the most-popular subset of them.
arXiv Detail & Related papers (2020-04-13T10:49:53Z) - COVID-19 on Social Media: Analyzing Misinformation in Twitter
Conversations [22.43295864610142]
We collected streaming data related to COVID-19 using the Twitter API, starting March 1, 2020.
We identified unreliable and misleading contents based on fact-checking sources.
We examined the narratives promoted in misinformation tweets, along with the distribution of engagements with these tweets.
arXiv Detail & Related papers (2020-03-26T09:48:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.