"I Can't Keep It Up." A Dataset from the Defunct Voat.co News Aggregator
- URL: http://arxiv.org/abs/2201.05933v3
- Date: Fri, 22 Apr 2022 17:06:07 GMT
- Title: "I Can't Keep It Up." A Dataset from the Defunct Voat.co News Aggregator
- Authors: Amin Mekacher, Antonis Papasavva
- Abstract summary: Voat.co was a news aggregator website that shut down on December 25, 2020.
This paper presents a dataset with over 2.3M submissions and 16.2M comments posted from 113K users in 7.1K subverses.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Voat.co was a news aggregator website that shut down on December 25, 2020.
The site had a troubled history and was known for hosting various banned
subreddits. This paper presents a dataset with over 2.3M submissions and 16.2M
comments posted from 113K users in 7.1K subverses (the equivalent of subreddit
for Voat). Our dataset covers the whole lifetime of Voat, from its developing
period starting on November 8, 2013, the day it was founded, April 2014, up
until the day it shut down (December 25, 2020). This work presents the largest
and most complete publicly available Voat dataset, to the best of our
knowledge. Along with the release of this dataset, we present a preliminary
analysis covering posting activity and daily user and subverse registration on
the platform so that researchers interested in our dataset can know what to
expect. Our data may prove helpful to false news dissemination studies as we
analyze the links users share on the platform, finding that many communities
rely on alternative news press, like Breitbart and GatewayPundit, for their
daily discussions. In addition, we perform network analysis on user
interactions finding that many users prefer not to interact with subverses
outside their narrative interests, which could be helpful to researchers
focusing on polarization and echo chambers. Also, since Voat was one of the
platforms banned Reddit communities migrated to, we are confident our dataset
will motivate and assist researchers studying deplatforming. Finally, many
hateful and conspiratorial communities were very popular on Voat, which makes
our work valuable for researchers focusing on toxicity, conspiracy theories,
cross-platform studies of social networks, and natural language processing.
Related papers
- iDRAMA-Scored-2024: A Dataset of the Scored Social Media Platform from 2020 to 2023 [22.685953309889825]
We release a large-scale dataset from Scored, an alternative Reddit platform.
At least 58 communities identified as migrating from Reddit and over 950 communities created since the platform's inception.
We provide sentence embeddings of all posts in our dataset, generated through a state-of-the-art model.
arXiv Detail & Related papers (2024-05-16T16:34:03Z) - Online conspiracy communities are more resilient to deplatforming [2.9767849911461504]
We compare the shift in behavior of users affected by the ban of two large communities on Reddit, GreatAwakening and FatPeopleHate.
We estimate how many users migrate, finding that users in the conspiracy community are much more likely to leave Reddit altogether and join Voat.
Few migrating zealots drive the growth of the new GreatAwakening community on Voat, while this effect is absent for FatPeopleHate.
arXiv Detail & Related papers (2023-03-21T18:08:51Z) - Reaching the bubble may not be enough: news media role in online
political polarization [58.720142291102135]
A way of reducing polarization would be by distributing cross-partisan news among individuals with distinct political orientations.
This study investigates whether this holds in the context of nationwide elections in Brazil and Canada.
arXiv Detail & Related papers (2021-09-18T11:34:04Z) - News consumption and social media regulations policy [70.31753171707005]
We analyze two social media that enforced opposite moderation methods, Twitter and Gab, to assess the interplay between news consumption and content regulation.
Our results show that the presence of moderation pursued by Twitter produces a significant reduction of questionable content.
The lack of clear regulation on Gab results in the tendency of the user to engage with both types of content, showing a slight preference for the questionable ones which may account for a dissing/endorsement behavior.
arXiv Detail & Related papers (2021-06-07T19:26:32Z) - The Rise and Fall of Fake News sites: A Traffic Analysis [62.51737815926007]
We investigate the online presence of fake news websites and characterize their behavior in comparison to real news websites.
Based on our findings, we build a content-agnostic ML for automatic detection of fake news websites.
arXiv Detail & Related papers (2021-03-16T18:10:22Z) - A Multi-Platform Analysis of Political News Discussion and Sharing on
Web Communities [13.364612995946876]
We compile a list of 1,073 news websites and extract posts from four Web communities that contain URLs from these sources.
This yields a dataset of 38M posts containing 15M news URLs, spanning almost three years.
We study the data along several axes, assessing the trustworthiness of shared news, designing a method to group news articles into stories, analyzing these stories are discussed and measuring the influence various Web communities have in that.
arXiv Detail & Related papers (2021-03-05T12:27:28Z) - Political audience diversity and news reliability in algorithmic ranking [54.23273310155137]
We propose using the political diversity of a website's audience as a quality signal.
Using news source reliability ratings from domain experts and web browsing data from a diverse sample of 6,890 U.S. citizens, we first show that websites with more extreme and less politically diverse audiences have lower journalistic standards.
arXiv Detail & Related papers (2020-07-16T02:13:55Z) - Racism is a Virus: Anti-Asian Hate and Counterspeech in Social Media
during the COVID-19 Crisis [51.39895377836919]
COVID-19 has sparked racism and hate on social media targeted towards Asian communities.
We study the evolution and spread of anti-Asian hate speech through the lens of Twitter.
We create COVID-HATE, the largest dataset of anti-Asian hate and counterspeech spanning 14 months.
arXiv Detail & Related papers (2020-05-25T21:58:09Z) - Measuring and Characterizing Hate Speech on News Websites [13.289076063197466]
We analyze 125M comments posted on 412K news articles over the course of 19 months.
We find statistically significant increases in hateful commenting activity around real-world divisive events like the "Unite the Right" rally in Charlottesville.
We find that articles that attract a substantial number of hateful comments have different linguistic characteristics when compared to articles that do not attract hateful comments.
arXiv Detail & Related papers (2020-05-16T09:59:01Z) - Echo Chambers on Social Media: A comparative analysis [64.2256216637683]
We introduce an operational definition of echo chambers and perform a massive comparative analysis on 1B pieces of contents produced by 1M users on four social media platforms.
We infer the leaning of users about controversial topics and reconstruct their interaction networks by analyzing different features.
We find support for the hypothesis that platforms implementing news feed algorithms like Facebook may elicit the emergence of echo-chambers.
arXiv Detail & Related papers (2020-04-20T20:00:27Z) - Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the
Politically Incorrect Board [12.14455026524814]
This paper presents a dataset with over 3.3M threads and 134.5M posts from the imageboard forum 4chan.
To the best of our knowledge, this represents the largest publicly available 4chan dataset.
We hope this dataset may be used for cross-platform studies of social media, as well as being useful for other types of research like natural language processing.
arXiv Detail & Related papers (2020-01-21T12:52:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.