iDRAMA-Scored-2024: A Dataset of the Scored Social Media Platform from 2020 to 2023
- URL: http://arxiv.org/abs/2405.10233v1
- Date: Thu, 16 May 2024 16:34:03 GMT
- Title: iDRAMA-Scored-2024: A Dataset of the Scored Social Media Platform from 2020 to 2023
- Authors: Jay Patel, Pujan Paudel, Emiliano De Cristofaro, Gianluca Stringhini, Jeremy Blackburn,
- Abstract summary: We release a large-scale dataset from Scored, an alternative Reddit platform.
At least 58 communities identified as migrating from Reddit and over 950 communities created since the platform's inception.
We provide sentence embeddings of all posts in our dataset, generated through a state-of-the-art model.
- Score: 22.685953309889825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online web communities often face bans for violating platform policies, encouraging their migration to alternative platforms. This migration, however, can result in increased toxicity and unforeseen consequences on the new platform. In recent years, researchers have collected data from many alternative platforms, indicating coordinated efforts leading to offline events, conspiracy movements, hate speech propagation, and harassment. Thus, it becomes crucial to characterize and understand these alternative platforms. To advance research in this direction, we collect and release a large-scale dataset from Scored -- an alternative Reddit platform that sheltered banned fringe communities, for example, c/TheDonald (a prominent right-wing community) and c/GreatAwakening (a conspiratorial community). Over four years, we collected approximately 57M posts from Scored, with at least 58 communities identified as migrating from Reddit and over 950 communities created since the platform's inception. Furthermore, we provide sentence embeddings of all posts in our dataset, generated through a state-of-the-art model, to further advance the field in characterizing the discussions within these communities. We aim to provide these resources to facilitate their investigations without the need for extensive data collection and processing efforts.
Related papers
- On the Use of Proxies in Political Ad Targeting [49.61009579554272]
We show that major political advertisers circumvented mitigations by targeting proxy attributes.
Our findings have crucial implications for the ongoing discussion on the regulation of political advertising.
arXiv Detail & Related papers (2024-10-18T17:15:13Z) - "I'm in the Bluesky Tonight": Insights from a Year Worth of Social Data [0.18416014644193066]
We present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social.
The dataset contains the complete post history of over 4M users (81% of all registered accounts), totalling 235M posts.
arXiv Detail & Related papers (2024-04-29T16:43:39Z) - MetaHate: A Dataset for Unifying Efforts on Hate Speech Detection [2.433983268807517]
Hate speech poses significant social, psychological, and occasionally physical threats to targeted individuals and communities.
Current computational linguistic approaches for tackling this phenomenon rely on labelled social media datasets for training.
We scrutinized over 60 datasets, selectively integrating those pertinent into MetaHate.
Our findings contribute to a deeper understanding of the existing datasets, paving the way for training more robust and adaptable models.
arXiv Detail & Related papers (2024-01-12T11:54:53Z) - Design and analysis of tweet-based election models for the 2021 Mexican
legislative election [55.41644538483948]
We use a dataset of 15 million election-related tweets in the six months preceding election day.
We find that models using data with geographical attributes determine the results of the election with better precision and accuracy than conventional polling methods.
arXiv Detail & Related papers (2023-01-02T12:40:05Z) - Understanding Online Migration Decisions Following the Banning of
Radical Communities [0.2752817022620644]
We study how factors associated with the RECRO radicalization framework relate to users' migration decisions.
Our results show that individual-level factors, those relating to the behavior of users, are associated with the decision to post on the fringe platform.
arXiv Detail & Related papers (2022-12-09T10:43:15Z) - "I Can't Keep It Up." A Dataset from the Defunct Voat.co News Aggregator [0.0]
Voat.co was a news aggregator website that shut down on December 25, 2020.
This paper presents a dataset with over 2.3M submissions and 16.2M comments posted from 113K users in 7.1K subverses.
arXiv Detail & Related papers (2022-01-15T23:25:53Z) - This Must Be the Place: Predicting Engagement of Online Communities in a
Large-scale Distributed Campaign [70.69387048368849]
We study the behavior of communities with millions of active members.
We develop a hybrid model, combining textual cues, community meta-data, and structural properties.
We demonstrate the applicability of our model through Reddit's r/place a large-scale online experiment.
arXiv Detail & Related papers (2022-01-14T08:23:16Z) - News consumption and social media regulations policy [70.31753171707005]
We analyze two social media that enforced opposite moderation methods, Twitter and Gab, to assess the interplay between news consumption and content regulation.
Our results show that the presence of moderation pursued by Twitter produces a significant reduction of questionable content.
The lack of clear regulation on Gab results in the tendency of the user to engage with both types of content, showing a slight preference for the questionable ones which may account for a dissing/endorsement behavior.
arXiv Detail & Related papers (2021-06-07T19:26:32Z) - Do Platform Migrations Compromise Content Moderation? Evidence from
r/The_Donald and r/Incels [20.41491269475746]
We report the results of a large-scale observational study of how problematic online communities progress following community-level moderation measures.
Our results suggest that, in both cases, moderation measures significantly decreased posting activity on the new platform.
In spite of that, users in one of the studied communities showed increases in signals associated with toxicity and radicalization.
arXiv Detail & Related papers (2020-10-20T16:03:06Z) - Echo Chambers on Social Media: A comparative analysis [64.2256216637683]
We introduce an operational definition of echo chambers and perform a massive comparative analysis on 1B pieces of contents produced by 1M users on four social media platforms.
We infer the leaning of users about controversial topics and reconstruct their interaction networks by analyzing different features.
We find support for the hypothesis that platforms implementing news feed algorithms like Facebook may elicit the emergence of echo-chambers.
arXiv Detail & Related papers (2020-04-20T20:00:27Z) - An Iterative Approach for Identifying Complaint Based Tweets in Social
Media Platforms [76.9570531352697]
We propose an iterative methodology which aims to identify complaint based posts pertaining to the transport domain.
We perform comprehensive evaluations along with releasing a novel dataset for the research purposes.
arXiv Detail & Related papers (2020-01-24T22:23:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.