Deep Learning for Hate Speech Detection: A Comparative Study
- URL: http://arxiv.org/abs/2202.09517v2
- Date: Thu, 7 Dec 2023 01:07:13 GMT
- Title: Deep Learning for Hate Speech Detection: A Comparative Study
- Authors: Jitendra Singh Malik, Hezhe Qiao, Guansong Pang, Anton van den Hengel
- Abstract summary: We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
- Score: 54.42226495344908
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated hate speech detection is an important tool in combating the spread
of hate speech, particularly in social media. Numerous methods have been
developed for the task, including a recent proliferation of deep-learning based
approaches. A variety of datasets have also been developed, exemplifying
various manifestations of the hate-speech detection problem. We present here a
large-scale empirical comparison of deep and shallow hate-speech detection
methods, mediated through the three most commonly used datasets. Our goal is to
illuminate progress in the area, and identify strengths and weaknesses in the
current state-of-the-art. We particularly focus our analysis on measures of
practical performance, including detection accuracy, computational efficiency,
capability in using pre-trained models, and domain generalization. In doing so
we aim to provide guidance as to the use of hate-speech detection in practice,
quantify the state-of-the-art, and identify future research directions. Code
and dataset are available at
https://github.com/jmjmalik22/Hate-Speech-Detection.
Related papers
- Bridging Modalities: Enhancing Cross-Modality Hate Speech Detection with Few-Shot In-Context Learning [4.136573141724715]
Hate speech on the internet poses a significant challenge to digital platform safety.
Recent research has developed detection models tailored to specific modalities.
This study conducts extensive experiments using few-shot in-context learning with large language models.
arXiv Detail & Related papers (2024-10-08T01:27:12Z) - Empirical Evaluation of Public HateSpeech Datasets [0.0]
Social media platforms are widely utilised for generating datasets employed in training and evaluating machine learning algorithms for hate speech detection.
Existing public datasets exhibit numerous limitations, hindering the effective training of these algorithms and leading to inaccurate hate speech classification.
This work aims to advance the development of more accurate and reliable machine learning models for hate speech detection.
arXiv Detail & Related papers (2024-06-27T11:20:52Z) - Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors.
In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z) - Probing Critical Learning Dynamics of PLMs for Hate Speech Detection [39.970726250810635]
Despite widespread adoption, there is a lack of research into how various critical aspects of pretrained language models affect their performance in hate speech detection.
We deep dive into comparing different pretrained models, evaluating their seed robustness, finetuning settings, and the impact of pretraining data collection time.
Our analysis reveals early peaks for downstream tasks during pretraining, the limited benefit of employing a more recent pretraining corpus, and the significance of specific layers during finetuning.
arXiv Detail & Related papers (2024-02-03T13:23:51Z) - What to Remember: Self-Adaptive Continual Learning for Audio Deepfake
Detection [53.063161380423715]
Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types.
We propose a continual learning approach called Radian Weight Modification (RWM) for audio deepfake detection.
arXiv Detail & Related papers (2023-12-15T09:52:17Z) - ToKen: Task Decomposition and Knowledge Infusion for Few-Shot Hate
Speech Detection [85.68684067031909]
We frame this problem as a few-shot learning task, and show significant gains with decomposing the task into its "constituent" parts.
In addition, we see that infusing knowledge from reasoning datasets (e.g. Atomic 2020) improves the performance even further.
arXiv Detail & Related papers (2022-05-25T05:10:08Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - AngryBERT: Joint Learning Target and Emotion for Hate Speech Detection [5.649040805759824]
This paper proposes a novel multitask learning-based model, AngryBERT, which jointly learns hate speech detection with sentiment classification and target identification as secondary relevant tasks.
Experiment results show that AngryBERT outperforms state-of-the-art single-task-learning and multitask learning baselines.
arXiv Detail & Related papers (2021-03-14T16:17:26Z) - Towards Hate Speech Detection at Large via Deep Generative Modeling [4.080068044420974]
Hate speech detection is a critical problem in social media platforms.
We present a dataset of 1 million realistic hate and non-hate sequences, produced by a deep generative language model.
We demonstrate consistent and significant performance improvements across five public hate speech datasets.
arXiv Detail & Related papers (2020-05-13T15:25:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.