Leveraging cross-platform data to improve automated hate speech
detection
- URL: http://arxiv.org/abs/2102.04895v1
- Date: Tue, 9 Feb 2021 15:52:34 GMT
- Title: Leveraging cross-platform data to improve automated hate speech
detection
- Authors: John D Gallacher
- Abstract summary: Most existing approaches for hate speech detection focus on a single social media platform in isolation.
Here we propose a new cross-platform approach to detect hate speech which leverages multiple datasets and classification models from different platforms.
We demonstrate how this approach outperforms existing models, and achieves good performance when tested on messages from novel social media platforms.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Hate speech is increasingly prevalent online, and its negative outcomes
include increased prejudice, extremism, and even offline hate crime. Automatic
detection of online hate speech can help us to better understand these impacts.
However, while the field has recently progressed through advances in natural
language processing, challenges still remain. In particular, most existing
approaches for hate speech detection focus on a single social media platform in
isolation. This limits both the use of these models and their validity, as the
nature of language varies from platform to platform. Here we propose a new
cross-platform approach to detect hate speech which leverages multiple datasets
and classification models from different platforms and trains a superlearner
that can combine existing and novel training data to improve detection and
increase model applicability. We demonstrate how this approach outperforms
existing models, and achieves good performance when tested on messages from
novel social media platforms not included in the original training data.
Related papers
- Hate Speech Detection Using Cross-Platform Social Media Data In English and German Language [6.200058263544999]
This study focuses on detecting bilingual hate speech in YouTube comments.
We include factors such as content similarity, definition similarity, and common hate words to measure the impact of datasets on performance.
The best performance was obtained by combining datasets from YouTube comments, Twitter, and Gab with an F1-score of 0.74 and 0.68 for English and German YouTube comments.
arXiv Detail & Related papers (2024-10-02T10:22:53Z) - Empirical Evaluation of Public HateSpeech Datasets [0.0]
Social media platforms are widely utilised for generating datasets employed in training and evaluating machine learning algorithms for hate speech detection.
Existing public datasets exhibit numerous limitations, hindering the effective training of these algorithms and leading to inaccurate hate speech classification.
This work aims to advance the development of more accurate and reliable machine learning models for hate speech detection.
arXiv Detail & Related papers (2024-06-27T11:20:52Z) - Hate Speech Detection in Limited Data Contexts using Synthetic Data
Generation [1.9506923346234724]
We propose a data augmentation approach that addresses the problem of lack of data for online hate speech detection in limited data contexts.
We present three methods to synthesize new examples of hate speech data in a target language that retains the hate sentiment in the original examples but transfers the hate targets.
Our findings show that a model trained on synthetic data performs comparably to, and in some cases outperforms, a model trained only on the samples available in the target domain.
arXiv Detail & Related papers (2023-10-04T15:10:06Z) - Causality Guided Disentanglement for Cross-Platform Hate Speech
Detection [15.489092194564149]
Social media platforms, despite their value in promoting open discourse, are often exploited to spread harmful content.
Our research introduces a cross-platform hate speech detection model capable of being trained on one platform's data and generalizing to multiple unseen platforms.
Our experiments across four platforms highlight our model's enhanced efficacy compared to existing state-of-the-art methods in detecting generalized hate speech.
arXiv Detail & Related papers (2023-08-03T23:39:03Z) - CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a
Context Synergized Hyperbolic Network [52.85130555886915]
CoSyn is a context-synergized neural network that explicitly incorporates user- and conversational context for detecting implicit hate speech in online conversations.
We show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 1.24% - 57.8%.
arXiv Detail & Related papers (2023-03-02T17:30:43Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - Panning for gold: Lessons learned from the platform-agnostic automated
detection of political content in textual data [48.7576911714538]
We discuss how these techniques can be used to detect political content across different platforms.
We compare the performance of three groups of detection techniques relying on dictionaries, supervised machine learning, or neural networks.
Our results show the limited impact of preprocessing on model performance, with the best results for less noisy data being achieved by neural network- and machine-learning-based models.
arXiv Detail & Related papers (2022-07-01T15:23:23Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.