Towards Fairness Assessment of Dutch Hate Speech Detection
- URL: http://arxiv.org/abs/2506.12502v1
- Date: Sat, 14 Jun 2025 13:33:12 GMT
- Title: Towards Fairness Assessment of Dutch Hate Speech Detection
- Authors: Julie Bauer, Rishabh Kaushal, Thales Bertaglia, Adriana Iamnitchi,
- Abstract summary: We evaluate the counterfactual fairness of hate speech detection models in the Dutch language.<n>Our analysis shows that models perform better in terms of hate speech detection, average counterfactual fairness and group fairness.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Numerous studies have proposed computational methods to detect hate speech online, yet most focus on the English language and emphasize model development. In this study, we evaluate the counterfactual fairness of hate speech detection models in the Dutch language, specifically examining the performance and fairness of transformer-based models. We make the following key contributions. First, we curate a list of Dutch Social Group Terms that reflect social context. Second, we generate counterfactual data for Dutch hate speech using LLMs and established strategies like Manual Group Substitution (MGS) and Sentence Log-Likelihood (SLL). Through qualitative evaluation, we highlight the challenges of generating realistic counterfactuals, particularly with Dutch grammar and contextual coherence. Third, we fine-tune baseline transformer-based models with counterfactual data and evaluate their performance in detecting hate speech. Fourth, we assess the fairness of these models using Counterfactual Token Fairness (CTF) and group fairness metrics, including equality of odds and demographic parity. Our analysis shows that models perform better in terms of hate speech detection, average counterfactual fairness and group fairness. This work addresses a significant gap in the literature on counterfactual fairness for hate speech detection in Dutch and provides practical insights and recommendations for improving both model performance and fairness.
Related papers
- Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models [49.1574468325115]
We introduce Speech-IFeval, an evaluation framework designed to assess instruction-following capabilities.<n>Recent SLMs integrate speech perception with large language models (LLMs), often degrading textual capabilities due to speech-centric training.<n>Our findings show that most SLMs struggle with even basic instructions, performing far worse than text-based LLMs.
arXiv Detail & Related papers (2025-05-25T08:37:55Z) - Dual-Class Prompt Generation: Enhancing Indonesian Gender-Based Hate Speech Detection through Data Augmentation [0.0]
Detecting gender-based hate speech in Indonesian social media remains challenging due to limited labeled datasets.<n>We evaluate backtranslation, single-class prompt generation, and our proposed dual-class prompt generation.<n>Our findings suggest that incorporating examples from both classes helps language models generate more diverse yet representative samples.
arXiv Detail & Related papers (2025-03-06T10:07:51Z) - Collapsed Language Models Promote Fairness [88.48232731113306]
We find that debiased language models exhibit collapsed alignment between token representations and word embeddings.<n>We design a principled fine-tuning method that can effectively improve fairness in a wide range of debiasing methods.
arXiv Detail & Related papers (2024-10-06T13:09:48Z) - Hate Speech Detection Using Cross-Platform Social Media Data In English and German Language [6.200058263544999]
This study focuses on detecting bilingual hate speech in YouTube comments.
We include factors such as content similarity, definition similarity, and common hate words to measure the impact of datasets on performance.
The best performance was obtained by combining datasets from YouTube comments, Twitter, and Gab with an F1-score of 0.74 and 0.68 for English and German YouTube comments.
arXiv Detail & Related papers (2024-10-02T10:22:53Z) - LLMs and Finetuning: Benchmarking cross-domain performance for hate speech detection [9.166963162285064]
This study investigates the effectiveness and adaptability of pre-trained and fine-tuned Large Language Models (LLMs) in identifying hate speech.<n>LLMs offer a huge advantage over the state-of-the-art even without pretraining.
arXiv Detail & Related papers (2023-10-29T10:07:32Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - DualFair: Fair Representation Learning at Both Group and Individual
Levels via Contrastive Self-supervision [73.80009454050858]
This work presents a self-supervised model, called DualFair, that can debias sensitive attributes like gender and race from learned representations.
Our model jointly optimize for two fairness criteria - group fairness and counterfactual fairness.
arXiv Detail & Related papers (2023-03-15T07:13:54Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Improving Counterfactual Generation for Fair Hate Speech Detection [26.79268141793483]
Bias mitigation approaches reduce models' dependence on sensitive features of data, such as social group tokens (SGTs)
In hate speech detection, however, equalizing model predictions may ignore important differences among targeted social groups.
Here, we rely on counterfactual fairness and equalize predictions among counterfactuals, generated by changing the SGTs.
arXiv Detail & Related papers (2021-08-03T19:47:27Z) - Statistical Analysis of Perspective Scores on Hate Speech Detection [7.447951461558536]
State-of-the-art hate speech classifiers are efficient only when tested on the data with the same feature distribution as training data.
In such a diverse data distribution relying on low level features is the main cause of deficiency due to natural bias in data.
We show that, different hate speech datasets are very similar when it comes to extract their Perspective Scores.
arXiv Detail & Related papers (2021-06-22T17:17:35Z) - Fair Hate Speech Detection through Evaluation of Social Group
Counterfactuals [21.375422346539004]
Approaches for mitigating bias in supervised models are designed to reduce models' dependence on specific sensitive features of the input data.
In the case of hate speech detection, it is not always desirable to equalize the effects of social groups.
Counterfactual token fairness for a mentioned social group evaluates the model's predictions as to whether they are the same for (a) the actual sentence and (b) a counterfactual instance.
Our approach assures robust model predictions for counterfactuals that imply similar meaning as the actual sentence.
arXiv Detail & Related papers (2020-10-24T04:51:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.