Related papers: PEACE 2.0: Grounded Explanations and Counter-Speech for Combating Hate Expressions

PEACE 2.0: Grounded Explanations and Counter-Speech for Combating Hate Expressions

URL: http://arxiv.org/abs/2602.17467v1
Date: Thu, 19 Feb 2026 15:33:56 GMT
Title: PEACE 2.0: Grounded Explanations and Counter-Speech for Combating Hate Expressions
Authors: Greta Damo, Stéphane Petiot, Elena Cabrio, Serena Villata,
Abstract summary: PEACE 2.0 is a novel tool that analyses and generates a response to hateful messages.<n>It enables in-depth analysis and response generation for both explicit and implicit hateful messages.
Score: 9.600892324769037
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The increasing volume of hate speech on online platforms poses significant societal challenges. While the Natural Language Processing community has developed effective methods to automatically detect the presence of hate speech, responses to it, called counter-speech, are still an open challenge. We present PEACE 2.0, a novel tool that, besides analysing and explaining why a message is considered hateful or not, also generates a response to it. More specifically, PEACE 2.0 has three main new functionalities: leveraging a Retrieval-Augmented Generation (RAG) pipeline i) to ground HS explanations into evidence and facts, ii) to automatically generate evidence-grounded counter-speech, and iii) exploring the characteristics of counter-speech replies. By integrating these capabilities, PEACE 2.0 enables in-depth analysis and response generation for both explicit and implicit hateful messages.

Related papers

Synthetic Voices, Real Threats: Evaluating Large Text-to-Speech Models in Generating Harmful Audio [63.18443674004945]
This work explores a content-centric threat: exploiting TTS systems to produce speech containing harmful content.<n>We present HARMGEN, a suite of five attacks organized into two families that address these challenges.
arXiv Detail & Related papers (2025-11-14T03:00:04Z)
HatePRISM: Policies, Platforms, and Research Integration. Advancing NLP for Hate Speech Proactive Mitigation [67.69631485036665]
We conduct a comprehensive examination of hate speech regulations and strategies from three perspectives.<n>Our findings reveal significant inconsistencies in hate speech definitions and moderation practices across jurisdictions.<n>We suggest ideas and research direction for further exploration of a unified framework for automated hate speech moderation.
arXiv Detail & Related papers (2025-07-06T11:25:23Z)
Is Safer Better? The Impact of Guardrails on the Argumentative Strength of LLMs in Hate Speech Countering [22.594296353433855]
We focus on two aspects of counterspeech generation to produce more cogent responses. First, we test whether the presence of safety guardrails hinders the quality of the generations. Secondly, we assess whether attacking a specific component of the hate speech results in a more effective argumentative strategy to fight online hate.
arXiv Detail & Related papers (2024-10-04T14:31:37Z)
SWE2: SubWord Enriched and Significant Word Emphasized Framework for Hate Speech Detection [3.0460060805145517]
We propose a novel hate speech detection framework called SWE2, which only relies on the content of messages and automatically identifies hate speech. Experimental results show that our proposed model achieves 0.975 accuracy and 0.953 macro F1, outperforming 7 state-of-the-art baselines.
arXiv Detail & Related papers (2024-09-25T07:05:44Z)
Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
In this article, we tackle the challenge of developing ASR systems without paired speech and text corpora.<n>We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.<n>This innovative model surpasses the performance of previous unsupervised ASR models under the lexicon-free setting.
arXiv Detail & Related papers (2024-06-12T16:30:58Z)
Outcome-Constrained Large Language Models for Countering Hate Speech [10.434435022492723]
This study aims to develop methods for generating counterspeech constrained by conversation outcomes. We experiment with large language models (LLMs) to incorporate into the text generation process two desired conversation outcomes. Evaluation results show that our methods effectively steer the generation of counterspeech toward the desired outcomes.
arXiv Detail & Related papers (2024-03-25T19:44:06Z)
Consolidating Strategies for Countering Hate Speech Using Persuasive Dialogues [3.8979646385036175]
We explore controllable strategies for generating counter-arguments to hateful comments in online conversations. Using automatic and human evaluations, we determine the best combination of features that generate fluent, argumentative, and logically sound arguments. We share developed computational models for automatically annotating text with such features, and a silver-standard annotated version of an existing hate speech dialog corpora.
arXiv Detail & Related papers (2024-01-15T16:31:18Z)
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer [57.82364057872905]
SpeechX is a versatile speech generation model capable of zero-shot TTS and various speech transformation tasks. Experimental results show SpeechX's efficacy in various tasks, including zero-shot TTS, noise suppression, target speaker extraction, speech removal, and speech editing with or without background noise.
arXiv Detail & Related papers (2023-08-14T01:01:19Z)
Hate Speech Detection via Dual Contrastive Learning [25.878271501274245]
We propose a novel dual contrastive learning framework for hate speech detection. Our framework jointly optimize the self-supervised and the supervised contrastive learning loss for capturing span-level information. We conduct experiments on two publicly available English datasets, and experimental results show that the proposed model outperforms the state-of-the-art models.
arXiv Detail & Related papers (2023-07-10T13:23:36Z)
SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts [108.04306136086807]
We present research that explores the application of prompt tuning to stimulate speech LMs for various generation tasks, within a unified framework called SpeechGen. The proposed unified framework holds great promise for efficiency and effectiveness, particularly with the imminent arrival of advanced speech LMs.
arXiv Detail & Related papers (2023-06-03T22:35:27Z)
CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network [52.85130555886915]
CoSyn is a context-synergized neural network that explicitly incorporates user- and conversational context for detecting implicit hate speech in online conversations. We show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 1.24% - 57.8%.
arXiv Detail & Related papers (2023-03-02T17:30:43Z)
CRUSH: Contextually Regularized and User anchored Self-supervised Hate speech Detection [6.759148939470331]
We introduce CRUSH, a framework for hate speech detection using user-anchored self-supervision and contextual regularization. Our proposed approach secures 1-12% improvement in test set metrics over best performing previous approaches on two types of tasks and multiple popular english social media datasets.
arXiv Detail & Related papers (2022-04-13T13:51:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.