iDRAMA-Scored-2024: A Dataset of the Scored Social Media Platform from 2020 to 2023
- URL: http://arxiv.org/abs/2405.10233v1
- Date: Thu, 16 May 2024 16:34:03 GMT
- Title: iDRAMA-Scored-2024: A Dataset of the Scored Social Media Platform from 2020 to 2023
- Authors: Jay Patel, Pujan Paudel, Emiliano De Cristofaro, Gianluca Stringhini, Jeremy Blackburn,
- Abstract summary: We release a large-scale dataset from Scored, an alternative Reddit platform.
At least 58 communities identified as migrating from Reddit and over 950 communities created since the platform's inception.
We provide sentence embeddings of all posts in our dataset, generated through a state-of-the-art model.
- Score: 22.685953309889825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online web communities often face bans for violating platform policies, encouraging their migration to alternative platforms. This migration, however, can result in increased toxicity and unforeseen consequences on the new platform. In recent years, researchers have collected data from many alternative platforms, indicating coordinated efforts leading to offline events, conspiracy movements, hate speech propagation, and harassment. Thus, it becomes crucial to characterize and understand these alternative platforms. To advance research in this direction, we collect and release a large-scale dataset from Scored -- an alternative Reddit platform that sheltered banned fringe communities, for example, c/TheDonald (a prominent right-wing community) and c/GreatAwakening (a conspiratorial community). Over four years, we collected approximately 57M posts from Scored, with at least 58 communities identified as migrating from Reddit and over 950 communities created since the platform's inception. Furthermore, we provide sentence embeddings of all posts in our dataset, generated through a state-of-the-art model, to further advance the field in characterizing the discussions within these communities. We aim to provide these resources to facilitate their investigations without the need for extensive data collection and processing efforts.
Related papers
- Characterizing the Fragmentation of the Social Media Ecosystem [39.58317527488534]
We use a dataset of 126M URLs posted by nearly 6M users on nine social media platforms.
We find a clear separation between mainstream and alt-tech platforms.
These findings outline the main dimensions defining the fragmentation and polarization of the social media ecosystem.
arXiv Detail & Related papers (2024-11-25T18:45:03Z) - Labeled Datasets for Research on Information Operations [71.34999856621306]
We present new labeled datasets about 26 campaigns, which contain both IO posts verified by a social media platform and over 13M posts by 303k accounts that discussed similar topics in the same time frames (control data)
The datasets will facilitate the study of narratives, network interactions, and engagement strategies employed by coordinated accounts across various campaigns and countries.
arXiv Detail & Related papers (2024-11-15T22:15:01Z) - On the Use of Proxies in Political Ad Targeting [49.61009579554272]
We show that major political advertisers circumvented mitigations by targeting proxy attributes.
Our findings have crucial implications for the ongoing discussion on the regulation of political advertising.
arXiv Detail & Related papers (2024-10-18T17:15:13Z) - MetaHate: A Dataset for Unifying Efforts on Hate Speech Detection [2.433983268807517]
Hate speech poses significant social, psychological, and occasionally physical threats to targeted individuals and communities.
Current computational linguistic approaches for tackling this phenomenon rely on labelled social media datasets for training.
We scrutinized over 60 datasets, selectively integrating those pertinent into MetaHate.
Our findings contribute to a deeper understanding of the existing datasets, paving the way for training more robust and adaptable models.
arXiv Detail & Related papers (2024-01-12T11:54:53Z) - Design and analysis of tweet-based election models for the 2021 Mexican
legislative election [55.41644538483948]
We use a dataset of 15 million election-related tweets in the six months preceding election day.
We find that models using data with geographical attributes determine the results of the election with better precision and accuracy than conventional polling methods.
arXiv Detail & Related papers (2023-01-02T12:40:05Z) - Understanding Online Migration Decisions Following the Banning of
Radical Communities [0.2752817022620644]
We study how factors associated with the RECRO radicalization framework relate to users' migration decisions.
Our results show that individual-level factors, those relating to the behavior of users, are associated with the decision to post on the fringe platform.
arXiv Detail & Related papers (2022-12-09T10:43:15Z) - "I Can't Keep It Up." A Dataset from the Defunct Voat.co News Aggregator [0.0]
Voat.co was a news aggregator website that shut down on December 25, 2020.
This paper presents a dataset with over 2.3M submissions and 16.2M comments posted from 113K users in 7.1K subverses.
arXiv Detail & Related papers (2022-01-15T23:25:53Z) - This Must Be the Place: Predicting Engagement of Online Communities in a
Large-scale Distributed Campaign [70.69387048368849]
We study the behavior of communities with millions of active members.
We develop a hybrid model, combining textual cues, community meta-data, and structural properties.
We demonstrate the applicability of our model through Reddit's r/place a large-scale online experiment.
arXiv Detail & Related papers (2022-01-14T08:23:16Z) - News consumption and social media regulations policy [70.31753171707005]
We analyze two social media that enforced opposite moderation methods, Twitter and Gab, to assess the interplay between news consumption and content regulation.
Our results show that the presence of moderation pursued by Twitter produces a significant reduction of questionable content.
The lack of clear regulation on Gab results in the tendency of the user to engage with both types of content, showing a slight preference for the questionable ones which may account for a dissing/endorsement behavior.
arXiv Detail & Related papers (2021-06-07T19:26:32Z) - Do Platform Migrations Compromise Content Moderation? Evidence from
r/The_Donald and r/Incels [20.41491269475746]
We report the results of a large-scale observational study of how problematic online communities progress following community-level moderation measures.
Our results suggest that, in both cases, moderation measures significantly decreased posting activity on the new platform.
In spite of that, users in one of the studied communities showed increases in signals associated with toxicity and radicalization.
arXiv Detail & Related papers (2020-10-20T16:03:06Z) - An Iterative Approach for Identifying Complaint Based Tweets in Social
Media Platforms [76.9570531352697]
We propose an iterative methodology which aims to identify complaint based posts pertaining to the transport domain.
We perform comprehensive evaluations along with releasing a novel dataset for the research purposes.
arXiv Detail & Related papers (2020-01-24T22:23:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.