Generative AI may backfire for counterspeech
- URL: http://arxiv.org/abs/2411.14986v2
- Date: Mon, 25 Nov 2024 11:10:34 GMT
- Title: Generative AI may backfire for counterspeech
- Authors: Dominik Bär, Abdurahman Maarouf, Stefan Feuerriegel,
- Abstract summary: We analyze whether contextualized counterspeech generated by state-of-the-art AI is effective in curbing online hate speech.
We find that non-contextualized counterspeech employing a warning-of-consequence strategy significantly reduces online hate speech.
However, contextualized counterspeech generated by LLMs proves ineffective and may even backfire.
- Score: 20.57872238271025
- License:
- Abstract: Online hate speech poses a serious threat to individual well-being and societal cohesion. A promising solution to curb online hate speech is counterspeech. Counterspeech is aimed at encouraging users to reconsider hateful posts by direct replies. However, current methods lack scalability due to the need for human intervention or fail to adapt to the specific context of the post. A potential remedy is the use of generative AI, specifically large language models (LLMs), to write tailored counterspeech messages. In this paper, we analyze whether contextualized counterspeech generated by state-of-the-art LLMs is effective in curbing online hate speech. To do so, we conducted a large-scale, pre-registered field experiment (N=2,664) on the social media platform Twitter/X. Our experiment followed a 2x2 between-subjects design and, additionally, a control condition with no counterspeech. On the one hand, users posting hateful content on Twitter/X were randomly assigned to receive either (a) contextualized counterspeech or (b) non-contextualized counterspeech. Here, the former is generated through LLMs, while the latter relies on predefined, generic messages. On the other hand, we tested two counterspeech strategies: (a) promoting empathy and (b) warning about the consequences of online misbehavior. We then measured whether users deleted their initial hateful posts and whether their behavior changed after the counterspeech intervention (e.g., whether users adopted a less toxic language). We find that non-contextualized counterspeech employing a warning-of-consequence strategy significantly reduces online hate speech. However, contextualized counterspeech generated by LLMs proves ineffective and may even backfire.
Related papers
- Decoding Hate: Exploring Language Models' Reactions to Hate Speech [2.433983268807517]
This paper investigates the reactions of seven state-of-the-art Large Language Models to hate speech.
We reveal the spectrum of responses these models produce, highlighting their capacity to handle hate speech inputs.
We also discuss strategies to mitigate hate speech generation by LLMs, particularly through fine-tuning and guideline guardrailing.
arXiv Detail & Related papers (2024-10-01T15:16:20Z) - Demarked: A Strategy for Enhanced Abusive Speech Moderation through Counterspeech, Detoxification, and Message Management [71.99446449877038]
We propose a more comprehensive approach called Demarcation scoring abusive speech based on four aspect -- (i) severity scale; (ii) presence of a target; (iii) context scale; (iv) legal scale.
Our work aims to inform future strategies for effectively addressing abusive speech online.
arXiv Detail & Related papers (2024-06-27T21:45:33Z) - Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
Most languages lack sufficient paired speech and text data to effectively train automatic speech recognition systems.
We propose the removal of reliance on a phoneme lexicon to develop unsupervised ASR systems.
We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.
arXiv Detail & Related papers (2024-06-12T16:30:58Z) - Hostile Counterspeech Drives Users From Hate Subreddits [1.5035331281822]
We analyze the effect of counterspeech on newcomers within hate subreddits on Reddit.
Non-hostile counterspeech is ineffective at keeping users from fully disengaging from these hate subreddits.
A single hostile counterspeech comment substantially reduces both future likelihood of engagement.
arXiv Detail & Related papers (2024-05-28T17:12:41Z) - NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps [43.40965978436158]
Counterspeech that refutes problematic content often mentions harmful language but is not harmful itself.
We show that even recent language models fail at distinguishing use from mention.
This failure propagates to two key downstream tasks: misinformation and hate speech detection.
arXiv Detail & Related papers (2024-04-02T05:36:41Z) - DisCGen: A Framework for Discourse-Informed Counterspeech Generation [34.75404551612012]
We propose a framework based on theories of discourse to study the inferential links that connect counter speeches to hateful comments.
We present a process for collecting an in-the-wild dataset of counterspeech from Reddit.
We show that by using our dataset and framework, large language models can generate contextually-grounded counterspeech informed by theories of discourse.
arXiv Detail & Related papers (2023-11-29T23:20:17Z) - Beyond Denouncing Hate: Strategies for Countering Implied Biases and
Stereotypes in Language [18.560379338032558]
We draw from psychology and philosophy literature to craft six psychologically inspired strategies to challenge the underlying stereotypical implications of hateful language.
We show that human-written counterspeech uses strategies that are more specific to the implied stereotype, whereas machine-generated counterspeech uses less specific strategies.
Our findings point to the importance of accounting for the underlying stereotypical implications of speech when generating counterspeech.
arXiv Detail & Related papers (2023-10-31T21:33:46Z) - ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph
Reading [65.88161811719353]
This work develops a lightweight yet effective Text-to-Speech system, ContextSpeech.
We first design a memory-cached recurrence mechanism to incorporate global text and speech context into sentence encoding.
We construct hierarchically-structured textual semantics to broaden the scope for global context enhancement.
Experiments show that ContextSpeech significantly improves the voice quality and prosody in paragraph reading with competitive model efficiency.
arXiv Detail & Related papers (2023-07-03T06:55:03Z) - CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a
Context Synergized Hyperbolic Network [52.85130555886915]
CoSyn is a context-synergized neural network that explicitly incorporates user- and conversational context for detecting implicit hate speech in online conversations.
We show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 1.24% - 57.8%.
arXiv Detail & Related papers (2023-03-02T17:30:43Z) - Countering Online Hate Speech: An NLP Perspective [34.19875714256597]
Online toxicity - an umbrella term for online hateful behavior - manifests itself in forms such as online hate speech.
The rising mass communication through social media further exacerbates the harmful consequences of online hate speech.
This paper presents a holistic conceptual framework on hate-speech NLP countering methods along with a thorough survey on the current progress of NLP for countering online hate speech.
arXiv Detail & Related papers (2021-09-07T08:48:13Z) - Racism is a Virus: Anti-Asian Hate and Counterspeech in Social Media
during the COVID-19 Crisis [51.39895377836919]
COVID-19 has sparked racism and hate on social media targeted towards Asian communities.
We study the evolution and spread of anti-Asian hate speech through the lens of Twitter.
We create COVID-HATE, the largest dataset of anti-Asian hate and counterspeech spanning 14 months.
arXiv Detail & Related papers (2020-05-25T21:58:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.