Generating Counter Narratives against Online Hate Speech: Data and
Strategies
- URL: http://arxiv.org/abs/2004.04216v1
- Date: Wed, 8 Apr 2020 19:35:00 GMT
- Title: Generating Counter Narratives against Online Hate Speech: Data and
Strategies
- Authors: Serra Sinem Tekiroglu, Yi-Ling Chung, Marco Guerini
- Abstract summary: We present a study on how to collect responses to hate effectively.
We employ large scale unsupervised language models such as GPT-2 for the generation of silver data.
The best annotation strategies/neural architectures can be used for data filtering before expert validation/post-editing.
- Score: 21.098614110697184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently research has started focusing on avoiding undesired effects that
come with content moderation, such as censorship and overblocking, when dealing
with hatred online. The core idea is to directly intervene in the discussion
with textual responses that are meant to counter the hate content and prevent
it from further spreading. Accordingly, automation strategies, such as natural
language generation, are beginning to be investigated. Still, they suffer from
the lack of sufficient amount of quality data and tend to produce
generic/repetitive responses. Being aware of the aforementioned limitations, we
present a study on how to collect responses to hate effectively, employing
large scale unsupervised language models such as GPT-2 for the generation of
silver data, and the best annotation strategies/neural architectures that can
be used for data filtering before expert validation/post-editing.
Related papers
- Detecting, Explaining, and Mitigating Memorization in Diffusion Models [49.438362005962375]
We introduce a straightforward yet effective method for detecting memorized prompts by inspecting the magnitude of text-conditional predictions.
Our proposed method seamlessly integrates without disrupting sampling algorithms, and delivers high accuracy even at the first generation step.
Building on our detection strategy, we unveil an explainable approach that shows the contribution of individual words or tokens to memorization.
arXiv Detail & Related papers (2024-07-31T16:13:29Z) - Generating Enhanced Negatives for Training Language-Based Object Detectors [86.1914216335631]
We propose to leverage the vast knowledge built into modern generative models to automatically build negatives that are more relevant to the original data.
Specifically, we use large-language-models to generate negative text descriptions, and text-to-image diffusion models to also generate corresponding negative images.
Our experimental analysis confirms the relevance of the generated negative data, and its use in language-based detectors improves performance on two complex benchmarks.
arXiv Detail & Related papers (2023-12-29T23:04:00Z) - Into the LAIONs Den: Investigating Hate in Multimodal Datasets [67.21783778038645]
This paper investigates the effect of scaling datasets on hateful content through a comparative audit of two datasets: LAION-400M and LAION-2B.
We found that hate content increased by nearly 12% with dataset scale, measured both qualitatively and quantitatively.
We also found that filtering dataset contents based on Not Safe For Work (NSFW) values calculated based on images alone does not exclude all the harmful content in alt-text.
arXiv Detail & Related papers (2023-11-06T19:00:05Z) - HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning [29.519687405350304]
We introduce a hate speech detection framework, HARE, which harnesses the reasoning capabilities of large language models (LLMs) to fill gaps in explanations of hate speech.
Experiments on SBIC and Implicit Hate benchmarks show that our method, using model-generated data, consistently outperforms baselines.
Our method enhances the explanation quality of trained models and improves generalization to unseen datasets.
arXiv Detail & Related papers (2023-11-01T06:09:54Z) - Hate Speech Detection in Limited Data Contexts using Synthetic Data
Generation [1.9506923346234724]
We propose a data augmentation approach that addresses the problem of lack of data for online hate speech detection in limited data contexts.
We present three methods to synthesize new examples of hate speech data in a target language that retains the hate sentiment in the original examples but transfers the hate targets.
Our findings show that a model trained on synthetic data performs comparably to, and in some cases outperforms, a model trained only on the samples available in the target domain.
arXiv Detail & Related papers (2023-10-04T15:10:06Z) - CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a
Context Synergized Hyperbolic Network [52.85130555886915]
CoSyn is a context-synergized neural network that explicitly incorporates user- and conversational context for detecting implicit hate speech in online conversations.
We show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 1.24% - 57.8%.
arXiv Detail & Related papers (2023-03-02T17:30:43Z) - APEACH: Attacking Pejorative Expressions with Analysis on
Crowd-Generated Hate Speech Evaluation Datasets [4.034948808542701]
APEACH is a method that allows the collection of hate speech generated by unspecified users.
By controlling the crowd-generation of hate speech and adding only a minimum post-labeling, we create a corpus that enables the generalizable and fair evaluation of hate speech detection.
arXiv Detail & Related papers (2022-02-25T02:04:38Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Towards Knowledge-Grounded Counter Narrative Generation for Hate Speech [15.039745292757672]
Tackling online hatred using informed textual responses - called counter narratives - has been brought under the spotlight recently.
Current neural approaches tend to produce generic/repetitive responses and lack grounded and up-to-date evidence.
We present the first complete knowledge-bound counter narrative generation pipeline, grounded in an external knowledge repository.
arXiv Detail & Related papers (2021-06-22T13:48:49Z) - An Information Retrieval Approach to Building Datasets for Hate Speech
Detection [3.587367153279349]
A common practice is to only annotate tweets containing known hate words''
A second challenge is that definitions of hate speech tend to be highly variable and subjective.
Our key insight is that the rarity and subjectivity of hate speech are akin to that of relevance in information retrieval (IR)
arXiv Detail & Related papers (2021-06-17T19:25:39Z) - Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News [57.9843300852526]
We introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions.
To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles.
In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies.
arXiv Detail & Related papers (2020-09-16T14:13:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.