Causality Guided Disentanglement for Cross-Platform Hate Speech
Detection
- URL: http://arxiv.org/abs/2308.02080v3
- Date: Sun, 10 Dec 2023 16:28:45 GMT
- Title: Causality Guided Disentanglement for Cross-Platform Hate Speech
Detection
- Authors: Paras Sheth, Tharindu Kumarage, Raha Moraffah, Aman Chadha, Huan Liu
- Abstract summary: Social media platforms, despite their value in promoting open discourse, are often exploited to spread harmful content.
Our research introduces a cross-platform hate speech detection model capable of being trained on one platform's data and generalizing to multiple unseen platforms.
Our experiments across four platforms highlight our model's enhanced efficacy compared to existing state-of-the-art methods in detecting generalized hate speech.
- Score: 15.489092194564149
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Social media platforms, despite their value in promoting open discourse, are
often exploited to spread harmful content. Current deep learning and natural
language processing models used for detecting this harmful content overly rely
on domain-specific terms affecting their capabilities to adapt to generalizable
hate speech detection. This is because they tend to focus too narrowly on
particular linguistic signals or the use of certain categories of words.
Another significant challenge arises when platforms lack high-quality annotated
data for training, leading to a need for cross-platform models that can adapt
to different distribution shifts. Our research introduces a cross-platform hate
speech detection model capable of being trained on one platform's data and
generalizing to multiple unseen platforms. To achieve good generalizability
across platforms, one way is to disentangle the input representations into
invariant and platform-dependent features. We also argue that learning causal
relationships, which remain constant across diverse environments, can
significantly aid in understanding invariant representations in hate speech. By
disentangling input into platform-dependent features (useful for predicting
hate targets) and platform-independent features (used to predict the presence
of hate), we learn invariant representations resistant to distribution shifts.
These features are then used to predict hate speech across unseen platforms.
Our extensive experiments across four platforms highlight our model's enhanced
efficacy compared to existing state-of-the-art methods in detecting generalized
hate speech.
Related papers
- SpeechAlign: Aligning Speech Generation to Human Preferences [51.684183257809075]
We introduce SpeechAlign, an iterative self-improvement strategy that aligns speech language models to human preferences.
We show that SpeechAlign can bridge the distribution gap and facilitate continuous self-improvement of the speech language model.
arXiv Detail & Related papers (2024-04-08T15:21:17Z) - Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales [15.458557611029518]
Social media platforms are a prominent arena for users to engage in interpersonal discussions and express opinions.
There arises a need to automatically identify and flag instances of hate speech.
We propose to use state-of-the-art Large Language Models (LLMs) to extract features in the form of rationales from the input text.
arXiv Detail & Related papers (2024-03-19T03:22:35Z) - PEACE: Cross-Platform Hate Speech Detection- A Causality-guided
Framework [14.437386966111719]
Hate speech detection refers to the task of detecting hateful content that aims at denigrating an individual or a group based on their religion, gender, sexual orientation, or other characteristics.
We propose a causality-guided framework, PEACE, that identifies and leverages two intrinsic causal cues omnipresent in hateful content.
arXiv Detail & Related papers (2023-06-15T01:18:02Z) - Combating high variance in Data-Scarce Implicit Hate Speech
Classification [0.0]
We develop a novel RoBERTa-based model that achieves state-of-the-art performance.
In this paper, we explore various optimization and regularization techniques and develop a novel RoBERTa-based model that achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-08-29T13:45:21Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z) - On Guiding Visual Attention with Language Specification [76.08326100891571]
We use high-level language specification as advice for constraining the classification evidence to task-relevant features, instead of distractors.
We show that supervising spatial attention in this way improves performance on classification tasks with biased and noisy data.
arXiv Detail & Related papers (2022-02-17T22:40:19Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - Leveraging cross-platform data to improve automated hate speech
detection [0.0]
Most existing approaches for hate speech detection focus on a single social media platform in isolation.
Here we propose a new cross-platform approach to detect hate speech which leverages multiple datasets and classification models from different platforms.
We demonstrate how this approach outperforms existing models, and achieves good performance when tested on messages from novel social media platforms.
arXiv Detail & Related papers (2021-02-09T15:52:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.