"I'm in the Bluesky Tonight": Insights from a Year Worth of Social Data
- URL: http://arxiv.org/abs/2404.18984v1
- Date: Mon, 29 Apr 2024 16:43:39 GMT
- Title: "I'm in the Bluesky Tonight": Insights from a Year Worth of Social Data
- Authors: Andrea Failla, Giulio Rossetti,
- Abstract summary: We present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social.
The dataset contains the complete post history of over 4M users (81% of all registered accounts), totalling 235M posts.
- Score: 0.18416014644193066
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. We present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social to address this pressing issue. The dataset contains the complete post history of over 4M users (81% of all registered accounts), totalling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions. Since Bluesky allows users to create and bookmark feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their timestamped ``like'' interactions and time of bookmarking. This dataset allows unprecedented analysis of online behavior and human-machine engagement patterns. Notably, it provides ground-truth data for studying the effects of content exposure and self-selection and performing content virality and diffusion analysis.
Related papers
- Easy-access online social media metrics can effectively identify misinformation sharing users [41.94295877935867]
We find that higher tweet frequency is positively associated with low factuality in shared content, while account age is negatively associated with it.
Our findings show that relying on these easy-access social network metrics could serve as a low-barrier approach for initial identification of users who are more likely to spread misinformation.
arXiv Detail & Related papers (2024-08-27T16:41:13Z) - BlueTempNet: A Temporal Multi-network Dataset of Social Interactions in Bluesky Social [14.829021021698349]
We present the first collection of the temporal dynamics of user-driven social interactions.
We collect existing Bluesky Feeds, including the users who liked and generated these Feeds.
This data-collection strategy captures past user behaviors and supports the future data collection of user behavior.
arXiv Detail & Related papers (2024-07-24T17:31:48Z) - SODA: Million-scale Dialogue Distillation with Social Commonsense
Contextualization [129.1927527781751]
We present SODA, the first publicly available, million-scale high-quality social dialogue dataset.
By contextualizing social commonsense knowledge from a knowledge graph, we are able to distill an exceptionally broad spectrum of social interactions.
Human evaluation shows that conversations in SODA are more consistent, specific, and (surprisingly) natural than those in prior human-authored datasets.
arXiv Detail & Related papers (2022-12-20T17:38:47Z) - Cross-Network Social User Embedding with Hybrid Differential Privacy
Guarantees [81.6471440778355]
We propose a Cross-network Social User Embedding framework, namely DP-CroSUE, to learn the comprehensive representations of users in a privacy-preserving way.
In particular, for each heterogeneous social network, we first introduce a hybrid differential privacy notion to capture the variation of privacy expectations for heterogeneous data types.
To further enhance user embeddings, a novel cross-network GCN embedding model is designed to transfer knowledge across networks through those aligned users.
arXiv Detail & Related papers (2022-09-04T06:22:37Z) - Federated Social Recommendation with Graph Neural Network [69.36135187771929]
We propose fusing social information with user-item interactions to alleviate it, which is the social recommendation problem.
We devise a novel framework textbfFedrated textbfSocial recommendation with textbfGraph neural network (FeSoG)
arXiv Detail & Related papers (2021-11-21T09:41:39Z) - News consumption and social media regulations policy [70.31753171707005]
We analyze two social media that enforced opposite moderation methods, Twitter and Gab, to assess the interplay between news consumption and content regulation.
Our results show that the presence of moderation pursued by Twitter produces a significant reduction of questionable content.
The lack of clear regulation on Gab results in the tendency of the user to engage with both types of content, showing a slight preference for the questionable ones which may account for a dissing/endorsement behavior.
arXiv Detail & Related papers (2021-06-07T19:26:32Z) - A Large-scale Dataset for Hate Speech Detection on Vietnamese Social
Media Texts [0.32228025627337864]
ViHSD is a human-annotated dataset for automatically detecting hate speech on the social network.
This dataset contains over 30,000 comments, each comment in the dataset has one of three labels: CLEAN, OFFENSIVE, or HATE.
arXiv Detail & Related papers (2021-03-22T00:55:47Z) - Exposure to Social Engagement Metrics Increases Vulnerability to
Misinformation [12.737240668157424]
We find that exposure to social engagement signals increases the vulnerability of users to misinformation.
To reduce the spread of misinformation, we call for technology platforms to rethink the display of social engagement metrics.
arXiv Detail & Related papers (2020-05-10T14:55:50Z) - Privacy-Aware Recommender Systems Challenge on Twitter's Home Timeline [47.434392695347924]
RecSys 2020 Challenge organized by ACM RecSys in partnership with Twitter using this dataset.
This paper touches on the key challenges faced by researchers and professionals striving to predict user engagements.
arXiv Detail & Related papers (2020-04-28T23:54:33Z) - Echo Chambers on Social Media: A comparative analysis [64.2256216637683]
We introduce an operational definition of echo chambers and perform a massive comparative analysis on 1B pieces of contents produced by 1M users on four social media platforms.
We infer the leaning of users about controversial topics and reconstruct their interaction networks by analyzing different features.
We find support for the hypothesis that platforms implementing news feed algorithms like Facebook may elicit the emergence of echo-chambers.
arXiv Detail & Related papers (2020-04-20T20:00:27Z) - Curating Social Media Data [0.0]
We propose a data curation pipeline, namely CrowdCorrect, to enable analysts cleansing and curating social data.
Our pipeline provides an automatic feature extraction from a corpus of social media data using existing in-house tools.
The implementation of this pipeline also includes a set of tools for automatically creating micro-tasks to facilitate the contribution of crowd users in curating the raw data.
arXiv Detail & Related papers (2020-02-21T10:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.