Related papers: ExtremeBB: A Database for Large-Scale Research into Online Hate, Harassment, the Manosphere and Extremism

ExtremeBB: A Database for Large-Scale Research into Online Hate, Harassment, the Manosphere and Extremism

URL: http://arxiv.org/abs/2111.04479v3
Date: Sun, 20 Aug 2023 22:38:14 GMT
Title: ExtremeBB: A Database for Large-Scale Research into Online Hate, Harassment, the Manosphere and Extremism
Authors: Anh V. Vu, Lydia Wilson, Yi Ting Chua, Ilia Shumailov, Ross Anderson
Abstract summary: We introduce ExtremeBB, a textual database of over 53.5M posts made by 38.5k users on 12 extremist bulletin board forums promoting online hate, harassment, the manosphere and other forms of extremism. It enables large-scale analyses of qualitative and quantitative historical trends going back two decades. ExtremeBB comes with a robust ethical data-sharing regime that allows us to share data with academics worldwide.
Score: 12.647120939857635
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce ExtremeBB, a textual database of over 53.5M posts made by 38.5k users on 12 extremist bulletin board forums promoting online hate, harassment, the manosphere and other forms of extremism. It enables large-scale analyses of qualitative and quantitative historical trends going back two decades: measuring hate speech and toxicity; tracing the evolution of different strands of extremist ideology; tracking the relationships between online subcultures, extremist behaviours, and real-world violence; and monitoring extremist communities in near real time. This can shed light not only on the spread of problematic ideologies but also the effectiveness of interventions. ExtremeBB comes with a robust ethical data-sharing regime that allows us to share data with academics worldwide. Since 2020, access has been granted to 49 licensees in 16 research groups from 12 institutions.

Related papers

Mapping the Italian Telegram Ecosystem [0.20482269513546458]
We conduct a large-scale analysis of the Italian Telegram sphere, leveraging a dataset of 186 million messages from 13,151 chats collected in 2023. Using network analysis, Large Language Models, and toxicity detection tools, we examine how different thematic communities form, align ideologically, and engage in harmful discourse. We find that Italian discourse primarily targets Black people, Jews, and gay individuals independently of the topic.
arXiv Detail & Related papers (2025-04-28T08:58:18Z)
Measuring Online Hate on 4chan using Pre-trained Deep Learning Models [4.970364068620607]
This work focuses on analysing and measuring the prevalence of online hate on 4chan's politically incorrect board (/pol/) We use state-of-the-art Natural Language Processing (NLP) models, specifically transformer-based models such as RoBERTa and Detoxify. Results show that 11.20% of this dataset is identified as containing hate in different categories.
arXiv Detail & Related papers (2025-03-30T22:47:11Z)
Multi-Platform Aggregated Dataset of Online Communities (MADOC) [64.45797970830233]
MADOC aggregates and standardizes data from Bluesky, Koo, Reddit, and Voat (2012-2024), containing 18.9 million posts, 236 million comments, and 23.1 million unique users. The dataset enables comparative studies of toxic behavior evolution across platforms through standardized interaction records and sentiment analysis.
arXiv Detail & Related papers (2025-01-22T14:02:11Z)
Unifying the Extremes: Developing a Unified Model for Detecting and Predicting Extremist Traits and Radicalization [13.611821646402818]
We propose a novel method for extracting and analyzing extremist discourse across a range of online community forums. By focusing on verbal behavioral signatures of extremist traits, we develop a framework for quantifying extremism at both user and community levels. Our findings contribute to the study of extremism by introducing a more holistic, cross-ideological approach.
arXiv Detail & Related papers (2025-01-08T20:17:24Z)
Quantifying Extreme Opinions on Reddit Amidst the 2023 Israeli-Palestinian Conflict [3.2430260063115224]
This study investigates the dynamics of extreme opinions on social media during the 2023 Israeli-Palestinian conflict. A lexicon-based, unsupervised methodology was developed to measure "extreme opinions" The analysis identifies significant peaks in extremism scores that correspond to pivotal real-life events.
arXiv Detail & Related papers (2024-12-14T17:52:28Z)
iDRAMA-Scored-2024: A Dataset of the Scored Social Media Platform from 2020 to 2023 [22.685953309889825]
We release a large-scale dataset from Scored, an alternative Reddit platform. At least 58 communities identified as migrating from Reddit and over 950 communities created since the platform's inception. We provide sentence embeddings of all posts in our dataset, generated through a state-of-the-art model.
arXiv Detail & Related papers (2024-05-16T16:34:03Z)
Monitoring the evolution of antisemitic discourse on extremist social media using BERT [3.3037858066178662]
Racism and intolerance on social media contribute to a toxic online environment which may spill offline to foster hatred. Tracking antisemitic themes and their associated terminology over time in online discussions could help monitor the sentiments of their participants.
arXiv Detail & Related papers (2024-02-06T20:34:49Z)
Hatemongers ride on echo chambers to escalate hate speech diffusion [23.714548893849393]
We analyze more than 32 million posts from over 6.8 million users across three popular online social networks. We find that hatemongers play a more crucial role in governing the spread of information compared to singled-out hateful content.
arXiv Detail & Related papers (2023-02-05T20:30:48Z)
DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally Spreading Out Disinformation [72.18912216025029]
We present DisinfoMeme to help detect disinformation memes. The dataset contains memes mined from Reddit covering three current topics: the COVID-19 pandemic, the Black Lives Matter movement, and veganism/vegetarianism.
arXiv Detail & Related papers (2022-05-25T09:54:59Z)
DISARM: Detecting the Victims Targeted by Harmful Memes [49.12165815990115]
DISARM is a framework that uses named entity recognition and person identification to detect harmful memes. We show that DISARM significantly outperforms ten unimodal and multimodal systems. It can reduce the relative error rate for harmful target identification by up to 9 points absolute over several strong multimodal rivals.
arXiv Detail & Related papers (2022-05-11T19:14:26Z)
A Comparison of Online Hate on Reddit and 4chan: A Case Study of the 2020 US Election [2.685668802278155]
We make use of various Natural Language Processing (NLP) techniques to analyse hateful content from Reddit and 4chan relating to the 2020 US Presidential Elections. Our findings show how content and posting activity can differ depending on the platform being used. We provide initial comparison into the platform-specific behaviours of online hate, and how different platforms can serve specific purposes.
arXiv Detail & Related papers (2022-02-02T21:48:56Z)
This Must Be the Place: Predicting Engagement of Online Communities in a Large-scale Distributed Campaign [70.69387048368849]
We study the behavior of communities with millions of active members. We develop a hybrid model, combining textual cues, community meta-data, and structural properties. We demonstrate the applicability of our model through Reddit's r/place a large-scale online experiment.
arXiv Detail & Related papers (2022-01-14T08:23:16Z)
#ISIS vs #ActionCountersTerrorism: A Computational Analysis of Extremist and Counter-extremist Twitter Narratives [2.685668802278155]
This study will apply computational techniques to analyse the narratives of various pro-extremist and counter-extremist Twitter accounts. Our findings show that pro-extremist accounts often use different strategies to disseminate content when compared to counter-extremist accounts across different types of organisations.
arXiv Detail & Related papers (2020-08-26T20:46:45Z)
Racism is a Virus: Anti-Asian Hate and Counterspeech in Social Media during the COVID-19 Crisis [51.39895377836919]
COVID-19 has sparked racism and hate on social media targeted towards Asian communities. We study the evolution and spread of anti-Asian hate speech through the lens of Twitter. We create COVID-HATE, the largest dataset of anti-Asian hate and counterspeech spanning 14 months.
arXiv Detail & Related papers (2020-05-25T21:58:09Z)
Cross-ethnicity Face Anti-spoofing Recognition Challenge: A Review [79.49390241265337]
Chalearn Face Anti-spoofing Attack Detection Challenge consists of single-modal (e.g., RGB) and multi-modal (e.g., RGB, Depth, Infrared (IR)) tracks. This paper presents an overview of the challenge, including its design, evaluation protocol and a summary of results.
arXiv Detail & Related papers (2020-04-23T06:43:08Z)
Echo Chambers on Social Media: A comparative analysis [64.2256216637683]
We introduce an operational definition of echo chambers and perform a massive comparative analysis on 1B pieces of contents produced by 1M users on four social media platforms. We infer the leaning of users about controversial topics and reconstruct their interaction networks by analyzing different features. We find support for the hypothesis that platforms implementing news feed algorithms like Facebook may elicit the emergence of echo-chambers.
arXiv Detail & Related papers (2020-04-20T20:00:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.