ExtremeBB: A Database for Large-Scale Research into Online Hate,
Harassment, the Manosphere and Extremism
- URL: http://arxiv.org/abs/2111.04479v3
- Date: Sun, 20 Aug 2023 22:38:14 GMT
- Title: ExtremeBB: A Database for Large-Scale Research into Online Hate,
Harassment, the Manosphere and Extremism
- Authors: Anh V. Vu, Lydia Wilson, Yi Ting Chua, Ilia Shumailov, Ross Anderson
- Abstract summary: We introduce ExtremeBB, a textual database of over 53.5M posts made by 38.5k users on 12 extremist bulletin board forums promoting online hate, harassment, the manosphere and other forms of extremism.
It enables large-scale analyses of qualitative and quantitative historical trends going back two decades.
ExtremeBB comes with a robust ethical data-sharing regime that allows us to share data with academics worldwide.
- Score: 12.647120939857635
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce ExtremeBB, a textual database of over 53.5M posts made by 38.5k
users on 12 extremist bulletin board forums promoting online hate, harassment,
the manosphere and other forms of extremism. It enables large-scale analyses of
qualitative and quantitative historical trends going back two decades:
measuring hate speech and toxicity; tracing the evolution of different strands
of extremist ideology; tracking the relationships between online subcultures,
extremist behaviours, and real-world violence; and monitoring extremist
communities in near real time. This can shed light not only on the spread of
problematic ideologies but also the effectiveness of interventions. ExtremeBB
comes with a robust ethical data-sharing regime that allows us to share data
with academics worldwide. Since 2020, access has been granted to 49 licensees
in 16 research groups from 12 institutions.
Related papers
- Multi-Platform Aggregated Dataset of Online Communities (MADOC) [64.45797970830233]
MADOC aggregates and standardizes data from Bluesky, Koo, Reddit, and Voat (2012-2024), containing 18.9 million posts, 236 million comments, and 23.1 million unique users.
The dataset enables comparative studies of toxic behavior evolution across platforms through standardized interaction records and sentiment analysis.
arXiv Detail & Related papers (2025-01-22T14:02:11Z) - Unifying the Extremes: Developing a Unified Model for Detecting and Predicting Extremist Traits and Radicalization [13.611821646402818]
We propose a novel method for extracting and analyzing extremist discourse across a range of online community forums.
By focusing on verbal behavioral signatures of extremist traits, we develop a framework for quantifying extremism at both user and community levels.
Our findings contribute to the study of extremism by introducing a more holistic, cross-ideological approach.
arXiv Detail & Related papers (2025-01-08T20:17:24Z) - Quantifying Extreme Opinions on Reddit Amidst the 2023 Israeli-Palestinian Conflict [3.2430260063115224]
This study investigates the dynamics of extreme opinions on social media during the 2023 Israeli-Palestinian conflict.
A lexicon-based, unsupervised methodology was developed to measure "extreme opinions"
The analysis identifies significant peaks in extremism scores that correspond to pivotal real-life events.
arXiv Detail & Related papers (2024-12-14T17:52:28Z) - iDRAMA-Scored-2024: A Dataset of the Scored Social Media Platform from 2020 to 2023 [22.685953309889825]
We release a large-scale dataset from Scored, an alternative Reddit platform.
At least 58 communities identified as migrating from Reddit and over 950 communities created since the platform's inception.
We provide sentence embeddings of all posts in our dataset, generated through a state-of-the-art model.
arXiv Detail & Related papers (2024-05-16T16:34:03Z) - Monitoring the evolution of antisemitic discourse on extremist social media using BERT [3.3037858066178662]
Racism and intolerance on social media contribute to a toxic online environment which may spill offline to foster hatred.
Tracking antisemitic themes and their associated terminology over time in online discussions could help monitor the sentiments of their participants.
arXiv Detail & Related papers (2024-02-06T20:34:49Z) - DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally
Spreading Out Disinformation [72.18912216025029]
We present DisinfoMeme to help detect disinformation memes.
The dataset contains memes mined from Reddit covering three current topics: the COVID-19 pandemic, the Black Lives Matter movement, and veganism/vegetarianism.
arXiv Detail & Related papers (2022-05-25T09:54:59Z) - DISARM: Detecting the Victims Targeted by Harmful Memes [49.12165815990115]
DISARM is a framework that uses named entity recognition and person identification to detect harmful memes.
We show that DISARM significantly outperforms ten unimodal and multimodal systems.
It can reduce the relative error rate for harmful target identification by up to 9 points absolute over several strong multimodal rivals.
arXiv Detail & Related papers (2022-05-11T19:14:26Z) - #ISIS vs #ActionCountersTerrorism: A Computational Analysis of Extremist
and Counter-extremist Twitter Narratives [2.685668802278155]
This study will apply computational techniques to analyse the narratives of various pro-extremist and counter-extremist Twitter accounts.
Our findings show that pro-extremist accounts often use different strategies to disseminate content when compared to counter-extremist accounts across different types of organisations.
arXiv Detail & Related papers (2020-08-26T20:46:45Z) - Racism is a Virus: Anti-Asian Hate and Counterspeech in Social Media
during the COVID-19 Crisis [51.39895377836919]
COVID-19 has sparked racism and hate on social media targeted towards Asian communities.
We study the evolution and spread of anti-Asian hate speech through the lens of Twitter.
We create COVID-HATE, the largest dataset of anti-Asian hate and counterspeech spanning 14 months.
arXiv Detail & Related papers (2020-05-25T21:58:09Z) - Cross-ethnicity Face Anti-spoofing Recognition Challenge: A Review [79.49390241265337]
Chalearn Face Anti-spoofing Attack Detection Challenge consists of single-modal (e.g., RGB) and multi-modal (e.g., RGB, Depth, Infrared (IR)) tracks.
This paper presents an overview of the challenge, including its design, evaluation protocol and a summary of results.
arXiv Detail & Related papers (2020-04-23T06:43:08Z) - Echo Chambers on Social Media: A comparative analysis [64.2256216637683]
We introduce an operational definition of echo chambers and perform a massive comparative analysis on 1B pieces of contents produced by 1M users on four social media platforms.
We infer the leaning of users about controversial topics and reconstruct their interaction networks by analyzing different features.
We find support for the hypothesis that platforms implementing news feed algorithms like Facebook may elicit the emergence of echo-chambers.
arXiv Detail & Related papers (2020-04-20T20:00:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.