Just Another Day on Twitter: A Complete 24 Hours of Twitter Data
- URL: http://arxiv.org/abs/2301.11429v2
- Date: Tue, 11 Apr 2023 07:33:19 GMT
- Title: Just Another Day on Twitter: A Complete 24 Hours of Twitter Data
- Authors: Juergen Pfeffer, Daniel Matter, Kokil Jaidka, Onur Varol, Afra
Mashhadi, Jana Lasser, Dennis Assenmacher, Siqi Wu, Diyi Yang, Cornelia
Brantner, Daniel M. Romero, Jahna Otterbacher, Carsten Schwemmer, Kenneth
Joseph, David Garcia, Fred Morstatter
- Abstract summary: We have collected all 375 million tweets published within a 24-hour time period starting on September 21, 2022.
This is the first complete 24-hour Twitter dataset that is available for the research community.
- Score: 39.98744837726886
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: At the end of October 2022, Elon Musk concluded his acquisition of Twitter.
In the weeks and months before that, several questions were publicly discussed
that were not only of interest to the platform's future buyers, but also of
high relevance to the Computational Social Science research community. For
example, how many active users does the platform have? What percentage of
accounts on the site are bots? And, what are the dominating topics and
sub-topical spheres on the platform? In a globally coordinated effort of 80
scholars to shed light on these questions, and to offer a dataset that will
equip other researchers to do the same, we have collected all 375 million
tweets published within a 24-hour time period starting on September 21, 2022.
To the best of our knowledge, this is the first complete 24-hour Twitter
dataset that is available for the research community. With it, the present work
aims to accomplish two goals. First, we seek to answer the aforementioned
questions and provide descriptive metrics about Twitter that can serve as
references for other researchers. Second, we create a baseline dataset for
future research that can be used to study the potential impact of the
platform's ownership change.
Related papers
- RIP Twitter API: A eulogy to its vast research contributions [1.1687067206676627]
This study collects and tabulates the number of studies, number of citations, dates, major disciplines, and major topic areas that used Twitter data between 2006 and 2023.
Since 2006, a total of 27,453 studies have been published in 7,432 publication venues, with 1,303,142 citations, across 14 disciplines.
Major topics include: information dissemination, assessing the credibility of tweets, strategies for conducting data research, detecting and analyzing major events, and studying human behavior.
arXiv Detail & Related papers (2024-04-10T20:39:24Z) - Design and analysis of tweet-based election models for the 2021 Mexican
legislative election [55.41644538483948]
We use a dataset of 15 million election-related tweets in the six months preceding election day.
We find that models using data with geographical attributes determine the results of the election with better precision and accuracy than conventional polling methods.
arXiv Detail & Related papers (2023-01-02T12:40:05Z) - Twitter Dataset on the Russo-Ukrainian War [68.713984286035]
We have initiated an ongoing dataset acquisition from Twitter API.
The dataset has reached the amount of 57.3 million tweets, originating from 7.7 million users.
We apply an initial volume and sentiment analysis, while the dataset can be used to further exploratory investigation towards topic analysis, hate speech, propaganda recognition, or even show potential malicious entities like botnets.
arXiv Detail & Related papers (2022-04-07T12:33:06Z) - Manipulating Twitter Through Deletions [64.33261764633504]
Research into influence campaigns on Twitter has mostly relied on identifying malicious activities from tweets obtained via public APIs.
Here, we provide the first exhaustive, large-scale analysis of anomalous deletion patterns involving more than a billion deletions by over 11 million accounts.
We find that a small fraction of accounts delete a large number of tweets daily.
First, limits on tweet volume are circumvented, allowing certain accounts to flood the network with over 26 thousand daily tweets.
Second, coordinated networks of accounts engage in repetitive likes and unlikes of content that is eventually deleted, which can manipulate ranking algorithms.
arXiv Detail & Related papers (2022-03-25T20:07:08Z) - Identification of Twitter Bots based on an Explainable ML Framework: the
US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data.
Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm.
Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z) - Twitter Big Data as a Resource for Exoskeleton Research: A Large-Scale
Dataset of about 140,000 Tweets and 100 Research Questions [0.0]
The exoskeleton market is projected to increase by multiple times of its current value within the next two years.
It is crucial to study the degree and trends of user interest, views, opinions, perspectives, attitudes, acceptance, feedback, engagement, buying behavior, and satisfaction.
This work presents an open-access dataset of about 140,000 tweets about exoskeletons that were posted in a 5-year period from May 21, 2017, to May 21, 2022.
arXiv Detail & Related papers (2021-11-04T19:36:01Z) - CML-COVID: A Large-Scale COVID-19 Twitter Dataset with Latent Topics,
Sentiment and Location Information [0.0]
CML-COVID is a COVID-19 Twitter data set of 19,298,967 million tweets from 5,977,653 unique individuals.
These tweets were collected between March 2020 and July 2020 using the query terms coronavirus, covid and mask related to COVID-19.
arXiv Detail & Related papers (2021-01-28T18:59:10Z) - Towards A Sentiment Analyzer for Low-Resource Languages [0.0]
This research aims to analyse a sentiment of the users towards a particular trending topic that has been actively and massively discussed at that time.
We use the hashtag textit#kpujangancurang that was the trending topic during the Indonesia presidential election in 2019.
This research utilizes rapid miner tool to generate the twitter data and comparing Naive Bayes, K-Nearest Neighbor, Decision Tree, and Multi-Layer Perceptron classification methods to classify the sentiment of the twitter data.
arXiv Detail & Related papers (2020-11-12T13:50:00Z) - Privacy-Aware Recommender Systems Challenge on Twitter's Home Timeline [47.434392695347924]
RecSys 2020 Challenge organized by ACM RecSys in partnership with Twitter using this dataset.
This paper touches on the key challenges faced by researchers and professionals striving to predict user engagements.
arXiv Detail & Related papers (2020-04-28T23:54:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.