Combating high variance in Data-Scarce Implicit Hate Speech
Classification
- URL: http://arxiv.org/abs/2208.13595v1
- Date: Mon, 29 Aug 2022 13:45:21 GMT
- Title: Combating high variance in Data-Scarce Implicit Hate Speech
Classification
- Authors: Debaditya Pal, Kaustubh Chaudhari, Harsh Sharma
- Abstract summary: We develop a novel RoBERTa-based model that achieves state-of-the-art performance.
In this paper, we explore various optimization and regularization techniques and develop a novel RoBERTa-based model that achieves state-of-the-art performance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hate speech classification has been a long-standing problem in natural
language processing. However, even though there are numerous hate speech
detection methods, they usually overlook a lot of hateful statements due to
them being implicit in nature. Developing datasets to aid in the task of
implicit hate speech classification comes with its own challenges; difficulties
are nuances in language, varying definitions of what constitutes hate speech,
and the labor-intensive process of annotating such data. This had led to a
scarcity of data available to train and test such systems, which gives rise to
high variance problems when parameter-heavy transformer-based models are used
to address the problem. In this paper, we explore various optimization and
regularization techniques and develop a novel RoBERTa-based model that achieves
state-of-the-art performance.
Related papers
- Hate Speech Detection in Limited Data Contexts using Synthetic Data
Generation [1.9506923346234724]
We propose a data augmentation approach that addresses the problem of lack of data for online hate speech detection in limited data contexts.
We present three methods to synthesize new examples of hate speech data in a target language that retains the hate sentiment in the original examples but transfers the hate targets.
Our findings show that a model trained on synthetic data performs comparably to, and in some cases outperforms, a model trained only on the samples available in the target domain.
arXiv Detail & Related papers (2023-10-04T15:10:06Z) - Making Retrieval-Augmented Language Models Robust to Irrelevant Context [55.564789967211844]
An important desideratum of RALMs, is that retrieved information helps model performance when it is relevant.
Recent work has shown that retrieval augmentation can sometimes have a negative effect on performance.
arXiv Detail & Related papers (2023-10-02T18:52:35Z) - Causality Guided Disentanglement for Cross-Platform Hate Speech
Detection [15.489092194564149]
Social media platforms, despite their value in promoting open discourse, are often exploited to spread harmful content.
Our research introduces a cross-platform hate speech detection model capable of being trained on one platform's data and generalizing to multiple unseen platforms.
Our experiments across four platforms highlight our model's enhanced efficacy compared to existing state-of-the-art methods in detecting generalized hate speech.
arXiv Detail & Related papers (2023-08-03T23:39:03Z) - Improving Distortion Robustness of Self-supervised Speech Processing
Tasks with Domain Adaptation [60.26511271597065]
Speech distortions are a long-standing problem that degrades the performance of supervisely trained speech processing models.
It is high time that we enhance the robustness of speech processing models to obtain good performance when encountering speech distortions.
arXiv Detail & Related papers (2022-03-30T07:25:52Z) - Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z) - On Guiding Visual Attention with Language Specification [76.08326100891571]
We use high-level language specification as advice for constraining the classification evidence to task-relevant features, instead of distractors.
We show that supervising spatial attention in this way improves performance on classification tasks with biased and noisy data.
arXiv Detail & Related papers (2022-02-17T22:40:19Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Latent Hatred: A Benchmark for Understanding Implicit Hate Speech [22.420275418616242]
This work introduces a theoretically-justified taxonomy of implicit hate speech and a benchmark corpus with fine-grained labels for each message.
We present systematic analyses of our dataset using contemporary baselines to detect and explain implicit hate speech.
arXiv Detail & Related papers (2021-09-11T16:52:56Z) - Leveraging cross-platform data to improve automated hate speech
detection [0.0]
Most existing approaches for hate speech detection focus on a single social media platform in isolation.
Here we propose a new cross-platform approach to detect hate speech which leverages multiple datasets and classification models from different platforms.
We demonstrate how this approach outperforms existing models, and achieves good performance when tested on messages from novel social media platforms.
arXiv Detail & Related papers (2021-02-09T15:52:34Z) - Evaluating Factuality in Generation with Dependency-level Entailment [57.5316011554622]
We propose a new formulation of entailment that decomposes it at the level of dependency arcs.
We show that our dependency arc entailment model trained on this data can identify factual inconsistencies in paraphrasing and summarization better than sentence-level methods.
arXiv Detail & Related papers (2020-10-12T06:43:10Z) - Towards Hate Speech Detection at Large via Deep Generative Modeling [4.080068044420974]
Hate speech detection is a critical problem in social media platforms.
We present a dataset of 1 million realistic hate and non-hate sequences, produced by a deep generative language model.
We demonstrate consistent and significant performance improvements across five public hate speech datasets.
arXiv Detail & Related papers (2020-05-13T15:25:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.