Detection of Hate Speech using BERT and Hate Speech Word Embedding with
Deep Model
- URL: http://arxiv.org/abs/2111.01515v1
- Date: Tue, 2 Nov 2021 11:42:54 GMT
- Title: Detection of Hate Speech using BERT and Hate Speech Word Embedding with
Deep Model
- Authors: Hind Saleh, Areej Alhothali, Kawthar Moria
- Abstract summary: This paper investigates the feasibility of leveraging domain-specific word embedding in Bidirectional LSTM based deep model to automatically detect/classify hate speech.
The experiments showed that domainspecific word embedding with the Bidirectional LSTM based deep model achieved a 93% f1-score while BERT achieved up to 96% f1-score on a combined balanced dataset from available hate speech datasets.
- Score: 0.5801044612920815
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The enormous amount of data being generated on the web and social media has
increased the demand for detecting online hate speech. Detecting hate speech
will reduce their negative impact and influence on others. A lot of effort in
the Natural Language Processing (NLP) domain aimed to detect hate speech in
general or detect specific hate speech such as religion, race, gender, or
sexual orientation. Hate communities tend to use abbreviations, intentional
spelling mistakes, and coded words in their communication to evade detection,
adding more challenges to hate speech detection tasks. Thus, word
representation will play an increasingly pivotal role in detecting hate speech.
This paper investigates the feasibility of leveraging domain-specific word
embedding in Bidirectional LSTM based deep model to automatically
detect/classify hate speech. Furthermore, we investigate the use of the
transfer learning language model (BERT) on hate speech problem as a binary
classification task. The experiments showed that domainspecific word embedding
with the Bidirectional LSTM based deep model achieved a 93% f1-score while BERT
achieved up to 96% f1-score on a combined balanced dataset from available hate
speech datasets.
Related papers
- An Investigation of Large Language Models for Real-World Hate Speech
Detection [46.15140831710683]
A major limitation of existing methods is that hate speech detection is a highly contextual problem.
Recently, large language models (LLMs) have demonstrated state-of-the-art performance in several natural language tasks.
Our study reveals that a meticulously crafted reasoning prompt can effectively capture the context of hate speech.
arXiv Detail & Related papers (2024-01-07T00:39:33Z) - Hate Speech Detection via Dual Contrastive Learning [25.878271501274245]
We propose a novel dual contrastive learning framework for hate speech detection.
Our framework jointly optimize the self-supervised and the supervised contrastive learning loss for capturing span-level information.
We conduct experiments on two publicly available English datasets, and experimental results show that the proposed model outperforms the state-of-the-art models.
arXiv Detail & Related papers (2023-07-10T13:23:36Z) - CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a
Context Synergized Hyperbolic Network [52.85130555886915]
CoSyn is a context-synergized neural network that explicitly incorporates user- and conversational context for detecting implicit hate speech in online conversations.
We show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 1.24% - 57.8%.
arXiv Detail & Related papers (2023-03-02T17:30:43Z) - Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Unsupervised Domain Adaptation for Hate Speech Detection Using a Data
Augmentation Approach [6.497816402045099]
We propose an unsupervised domain adaptation approach to augment labeled data for hate speech detection.
We show our approach improves Area under the Precision/Recall curve by as much as 42% and recall by as much as 278%.
arXiv Detail & Related papers (2021-07-27T15:01:22Z) - AngryBERT: Joint Learning Target and Emotion for Hate Speech Detection [5.649040805759824]
This paper proposes a novel multitask learning-based model, AngryBERT, which jointly learns hate speech detection with sentiment classification and target identification as secondary relevant tasks.
Experiment results show that AngryBERT outperforms state-of-the-art single-task-learning and multitask learning baselines.
arXiv Detail & Related papers (2021-03-14T16:17:26Z) - Learning Explicit Prosody Models and Deep Speaker Embeddings for
Atypical Voice Conversion [60.808838088376675]
We propose a VC system with explicit prosodic modelling and deep speaker embedding learning.
A prosody corrector takes in phoneme embeddings to infer typical phoneme duration and pitch values.
A conversion model takes phoneme embeddings and typical prosody features as inputs to generate the converted speech.
arXiv Detail & Related papers (2020-11-03T13:08:53Z) - Towards Hate Speech Detection at Large via Deep Generative Modeling [4.080068044420974]
Hate speech detection is a critical problem in social media platforms.
We present a dataset of 1 million realistic hate and non-hate sequences, produced by a deep generative language model.
We demonstrate consistent and significant performance improvements across five public hate speech datasets.
arXiv Detail & Related papers (2020-05-13T15:25:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.