ExtremeBB: A Database for Large-Scale Research into Online Hate,
Harassment, the Manosphere and Extremism
- URL: http://arxiv.org/abs/2111.04479v3
- Date: Sun, 20 Aug 2023 22:38:14 GMT
- Title: ExtremeBB: A Database for Large-Scale Research into Online Hate,
Harassment, the Manosphere and Extremism
- Authors: Anh V. Vu, Lydia Wilson, Yi Ting Chua, Ilia Shumailov, Ross Anderson
- Abstract summary: We introduce ExtremeBB, a textual database of over 53.5M posts made by 38.5k users on 12 extremist bulletin board forums promoting online hate, harassment, the manosphere and other forms of extremism.
It enables large-scale analyses of qualitative and quantitative historical trends going back two decades.
ExtremeBB comes with a robust ethical data-sharing regime that allows us to share data with academics worldwide.
- Score: 12.647120939857635
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce ExtremeBB, a textual database of over 53.5M posts made by 38.5k
users on 12 extremist bulletin board forums promoting online hate, harassment,
the manosphere and other forms of extremism. It enables large-scale analyses of
qualitative and quantitative historical trends going back two decades:
measuring hate speech and toxicity; tracing the evolution of different strands
of extremist ideology; tracking the relationships between online subcultures,
extremist behaviours, and real-world violence; and monitoring extremist
communities in near real time. This can shed light not only on the spread of
problematic ideologies but also the effectiveness of interventions. ExtremeBB
comes with a robust ethical data-sharing regime that allows us to share data
with academics worldwide. Since 2020, access has been granted to 49 licensees
in 16 research groups from 12 institutions.
Related papers
- iDRAMA-Scored-2024: A Dataset of the Scored Social Media Platform from 2020 to 2023 [22.685953309889825]
We release a large-scale dataset from Scored, an alternative Reddit platform.
At least 58 communities identified as migrating from Reddit and over 950 communities created since the platform's inception.
We provide sentence embeddings of all posts in our dataset, generated through a state-of-the-art model.
arXiv Detail & Related papers (2024-05-16T16:34:03Z) - Monitoring the evolution of antisemitic discourse on extremist social media using BERT [3.3037858066178662]
Racism and intolerance on social media contribute to a toxic online environment which may spill offline to foster hatred.
Tracking antisemitic themes and their associated terminology over time in online discussions could help monitor the sentiments of their participants.
arXiv Detail & Related papers (2024-02-06T20:34:49Z) - Hatemongers ride on echo chambers to escalate hate speech diffusion [23.714548893849393]
We analyze more than 32 million posts from over 6.8 million users across three popular online social networks.
We find that hatemongers play a more crucial role in governing the spread of information compared to singled-out hateful content.
arXiv Detail & Related papers (2023-02-05T20:30:48Z) - DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally
Spreading Out Disinformation [72.18912216025029]
We present DisinfoMeme to help detect disinformation memes.
The dataset contains memes mined from Reddit covering three current topics: the COVID-19 pandemic, the Black Lives Matter movement, and veganism/vegetarianism.
arXiv Detail & Related papers (2022-05-25T09:54:59Z) - DISARM: Detecting the Victims Targeted by Harmful Memes [49.12165815990115]
DISARM is a framework that uses named entity recognition and person identification to detect harmful memes.
We show that DISARM significantly outperforms ten unimodal and multimodal systems.
It can reduce the relative error rate for harmful target identification by up to 9 points absolute over several strong multimodal rivals.
arXiv Detail & Related papers (2022-05-11T19:14:26Z) - A Comparison of Online Hate on Reddit and 4chan: A Case Study of the
2020 US Election [2.685668802278155]
We make use of various Natural Language Processing (NLP) techniques to analyse hateful content from Reddit and 4chan relating to the 2020 US Presidential Elections.
Our findings show how content and posting activity can differ depending on the platform being used.
We provide initial comparison into the platform-specific behaviours of online hate, and how different platforms can serve specific purposes.
arXiv Detail & Related papers (2022-02-02T21:48:56Z) - This Must Be the Place: Predicting Engagement of Online Communities in a
Large-scale Distributed Campaign [70.69387048368849]
We study the behavior of communities with millions of active members.
We develop a hybrid model, combining textual cues, community meta-data, and structural properties.
We demonstrate the applicability of our model through Reddit's r/place a large-scale online experiment.
arXiv Detail & Related papers (2022-01-14T08:23:16Z) - #ISIS vs #ActionCountersTerrorism: A Computational Analysis of Extremist
and Counter-extremist Twitter Narratives [2.685668802278155]
This study will apply computational techniques to analyse the narratives of various pro-extremist and counter-extremist Twitter accounts.
Our findings show that pro-extremist accounts often use different strategies to disseminate content when compared to counter-extremist accounts across different types of organisations.
arXiv Detail & Related papers (2020-08-26T20:46:45Z) - Racism is a Virus: Anti-Asian Hate and Counterspeech in Social Media
during the COVID-19 Crisis [51.39895377836919]
COVID-19 has sparked racism and hate on social media targeted towards Asian communities.
We study the evolution and spread of anti-Asian hate speech through the lens of Twitter.
We create COVID-HATE, the largest dataset of anti-Asian hate and counterspeech spanning 14 months.
arXiv Detail & Related papers (2020-05-25T21:58:09Z) - Cross-ethnicity Face Anti-spoofing Recognition Challenge: A Review [79.49390241265337]
Chalearn Face Anti-spoofing Attack Detection Challenge consists of single-modal (e.g., RGB) and multi-modal (e.g., RGB, Depth, Infrared (IR)) tracks.
This paper presents an overview of the challenge, including its design, evaluation protocol and a summary of results.
arXiv Detail & Related papers (2020-04-23T06:43:08Z) - Echo Chambers on Social Media: A comparative analysis [64.2256216637683]
We introduce an operational definition of echo chambers and perform a massive comparative analysis on 1B pieces of contents produced by 1M users on four social media platforms.
We infer the leaning of users about controversial topics and reconstruct their interaction networks by analyzing different features.
We find support for the hypothesis that platforms implementing news feed algorithms like Facebook may elicit the emergence of echo-chambers.
arXiv Detail & Related papers (2020-04-20T20:00:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.