Related papers: Compositional Generalisation for Explainable Hate Speech Detection

Compositional Generalisation for Explainable Hate Speech Detection

URL: http://arxiv.org/abs/2506.03916v1
Date: Wed, 04 Jun 2025 13:07:36 GMT
Title: Compositional Generalisation for Explainable Hate Speech Detection
Authors: Agostina Calabrese, Tom Sherborne, Björn Ross, Mirella Lapata,
Abstract summary: Hate speech detection is key to online content moderation, but current models struggle to generalise beyond their training data.<n>We show that even when models are trained with more fine-grained, span-level annotations, they struggle to disentangle the meaning of these labels from the surrounding context.<n>We investigate whether training on a dataset where expressions occur with equal frequency across all contexts can improve generalisation.
Score: 52.41588643566991
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hate speech detection is key to online content moderation, but current models struggle to generalise beyond their training data. This has been linked to dataset biases and the use of sentence-level labels, which fail to teach models the underlying structure of hate speech. In this work, we show that even when models are trained with more fine-grained, span-level annotations (e.g., "artists" is labeled as target and "are parasites" as dehumanising comparison), they struggle to disentangle the meaning of these labels from the surrounding context. As a result, combinations of expressions that deviate from those seen during training remain particularly difficult for models to detect. We investigate whether training on a dataset where expressions occur with equal frequency across all contexts can improve generalisation. To this end, we create U-PLEAD, a dataset of ~364,000 synthetic posts, along with a novel compositional generalisation benchmark of ~8,000 manually validated posts. Training on a combination of U-PLEAD and real data improves compositional generalisation while achieving state-of-the-art performance on the human-sourced PLEAD.

Related papers

The Right Time Matters: Data Arrangement Affects Zero-Shot Generalization in Instruction Tuning [86.19804569376333]
We show that zero-shot generalization happens very early during instruction tuning.<n>We propose a more grounded training data arrangement framework, Test-centric Multi-turn Arrangement.
arXiv Detail & Related papers (2024-06-17T16:40:21Z)
Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization [31.40751207207214]
Recent computational approaches for combating online hate speech involve the automatic generation of counter narratives. This paper introduces novel attention regularization methodologies to improve the generalization capabilities of PLMs. Regularized models produce better counter narratives than state-of-the-art approaches in most cases.
arXiv Detail & Related papers (2023-09-05T15:27:22Z)
Combating high variance in Data-Scarce Implicit Hate Speech Classification [0.0]
We develop a novel RoBERTa-based model that achieves state-of-the-art performance. In this paper, we explore various optimization and regularization techniques and develop a novel RoBERTa-based model that achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-08-29T13:45:21Z)
Generalizing Hate Speech Detection Using Multi-Task Learning: A Case Study of Political Public Figures [3.825159708387601]
This work proposes a new Multi-task Learning pipeline that trains simultaneously across multiple hate speech datasets.<n>We show strong results when examining the generalization error in train-test splits and substantial improvements when predicting on previously unseen datasets.
arXiv Detail & Related papers (2022-08-22T21:13:38Z)
ToKen: Task Decomposition and Knowledge Infusion for Few-Shot Hate Speech Detection [85.68684067031909]
We frame this problem as a few-shot learning task, and show significant gains with decomposing the task into its "constituent" parts. In addition, we see that infusing knowledge from reasoning datasets (e.g. Atomic 2020) improves the performance even further.
arXiv Detail & Related papers (2022-05-25T05:10:08Z)
On Guiding Visual Attention with Language Specification [76.08326100891571]
We use high-level language specification as advice for constraining the classification evidence to task-relevant features, instead of distractors. We show that supervising spatial attention in this way improves performance on classification tasks with biased and noisy data.
arXiv Detail & Related papers (2022-02-17T22:40:19Z)
CONFIT: Toward Faithful Dialogue Summarization with Linguistically-Informed Contrastive Fine-tuning [5.389540975316299]
Factual inconsistencies in generated summaries severely limit the practical applications of abstractive dialogue summarization. We provide a typology of factual errors with annotation data to highlight the types of errors and move away from a binary understanding of factuality. We propose a training strategy that improves the factual consistency and overall quality of summaries via a novel contrastive fine-tuning, called ConFiT.
arXiv Detail & Related papers (2021-12-16T09:08:40Z)
On The Ingredients of an Effective Zero-shot Semantic Parser [95.01623036661468]
We analyze zero-shot learning by paraphrasing training examples of canonical utterances and programs from a grammar. We propose bridging these gaps using improved grammars, stronger paraphrasers, and efficient learning methods. Our model achieves strong performance on two semantic parsing benchmarks (Scholar, Geo) with zero labeled data.
arXiv Detail & Related papers (2021-10-15T21:41:16Z)
Improving Robustness by Augmenting Training Sentences with Predicate-Argument Structures [62.562760228942054]
Existing approaches to improve robustness against dataset biases mostly focus on changing the training objective. We propose to augment the input sentences in the training data with their corresponding predicate-argument structures. We show that without targeting a specific bias, our sentence augmentation improves the robustness of transformer models against multiple biases.
arXiv Detail & Related papers (2020-10-23T16:22:05Z)
Counterfactually-Augmented SNLI Training Data Does Not Yield Better Generalization Than Unaugmented Data [27.738670027154555]
Counterfactual augmentation of natural language understanding data does not appear to be an effective way of collecting training data. We build upon this work by using English natural language inference data to test model generalization and robustness.
arXiv Detail & Related papers (2020-10-09T18:44:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.