Related papers: Personalisation or Prejudice? Addressing Geographic Bias in Hate Speech Detection using Debias Tuning in Large Language Models

Personalisation or Prejudice? Addressing Geographic Bias in Hate Speech Detection using Debias Tuning in Large Language Models

URL: http://arxiv.org/abs/2505.02252v1
Date: Sun, 04 May 2025 21:22:20 GMT
Title: Personalisation or Prejudice? Addressing Geographic Bias in Hate Speech Detection using Debias Tuning in Large Language Models
Authors: Paloma Piot, Patricia Martín-Rodilla, Javier Parapar,
Abstract summary: Commercial Large Language Models (LLMs) have recently incorporated memory features to deliver personalised responses.<n>This paper examines different state-of-the-art LLMs to understand their behaviour in different personalisation scenarios.<n>We prompt the models to assume country-specific personas and use different languages for hate speech detection.<n>Our findings reveal that context personalisation significantly influences LLMs' responses in this sensitive area.
Score: 2.1656586298989793
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Commercial Large Language Models (LLMs) have recently incorporated memory features to deliver personalised responses. This memory retains details such as user demographics and individual characteristics, allowing LLMs to adjust their behaviour based on personal information. However, the impact of integrating personalised information into the context has not been thoroughly assessed, leading to questions about its influence on LLM behaviour. Personalisation can be challenging, particularly with sensitive topics. In this paper, we examine various state-of-the-art LLMs to understand their behaviour in different personalisation scenarios, specifically focusing on hate speech. We prompt the models to assume country-specific personas and use different languages for hate speech detection. Our findings reveal that context personalisation significantly influences LLMs' responses in this sensitive area. To mitigate these unwanted biases, we fine-tune the LLMs by penalising inconsistent hate speech classifications made with and without country or language-specific context. The refined models demonstrate improved performance in both personalised contexts and when no context is provided.

Related papers

Hateful Person or Hateful Model? Investigating the Role of Personas in Hate Speech Detection by Large Language Models [47.110656690979695]
We present the first comprehensive study on the role of persona prompts in hate speech classification.<n>A human annotation survey confirms that MBTI dimensions significantly affect labeling behavior.<n>Our analysis uncovers substantial persona-driven variation, including inconsistencies with ground truth, inter-persona disagreement, and logit-level biases.
arXiv Detail & Related papers (2025-06-10T09:02:55Z)
Human and LLM Biases in Hate Speech Annotations: A Socio-Demographic Analysis of Annotators and Targets [0.6918368994425961]
We leverage an extensive dataset with rich socio-demographic information of both annotators and targets.<n>Our analysis surfaces the presence of widespread biases, which we quantitatively describe and characterize based on their intensity and prevalence.<n>Our work offers new and nuanced results on human biases in hate speech annotations, as well as fresh insights into the design of AI-driven hate speech detection systems.
arXiv Detail & Related papers (2024-10-10T14:48:57Z)
Hate Personified: Investigating the role of LLMs in content moderation [64.26243779985393]
For subjective tasks such as hate detection, where people perceive hate differently, the Large Language Model's (LLM) ability to represent diverse groups is unclear. By including additional context in prompts, we analyze LLM's sensitivity to geographical priming, persona attributes, and numerical information to assess how well the needs of various groups are reflected.
arXiv Detail & Related papers (2024-10-03T16:43:17Z)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs) By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z)
Modulating Language Model Experiences through Frictions [56.17593192325438]
Over-consumption of language model outputs risks propagating unchecked errors in the short-term and damaging human capabilities for critical thinking in the long-term. We propose selective frictions for language model experiences, inspired by behavioral science interventions, to dampen misuse.
arXiv Detail & Related papers (2024-06-24T16:31:11Z)
LLMvsSmall Model? Large Language Model Based Text Augmentation Enhanced Personality Detection Model [58.887561071010985]
Personality detection aims to detect one's personality traits underlying in social media posts. Most existing methods learn post features directly by fine-tuning the pre-trained language models. We propose a large language model (LLM) based text augmentation enhanced personality detection model.
arXiv Detail & Related papers (2024-03-12T12:10:18Z)
Personalized Large Language Models [1.0881867638866944]
This paper investigates methods to personalize large language models (LLMs) Results demonstrate that personalized fine-tuning improves model reasoning compared to non-personalized models. Experiments on datasets for emotion recognition and hate speech detection show consistent performance gains with personalized methods.
arXiv Detail & Related papers (2024-02-14T15:55:30Z)
Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes. With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech. We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z)
Cross-lingual hate speech detection based on multilingual domain-specific word embeddings [4.769747792846004]
We propose to address the problem of multilingual hate speech detection from the perspective of transfer learning. Our goal is to determine if knowledge from one particular language can be used to classify other language. We show that the use of our simple yet specific multilingual hate representations improves classification results.
arXiv Detail & Related papers (2021-04-30T02:24:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.