Related papers: Hateful Person or Hateful Model? Investigating the Role of Personas in Hate Speech Detection by Large Language Models

Hateful Person or Hateful Model? Investigating the Role of Personas in Hate Speech Detection by Large Language Models

URL: http://arxiv.org/abs/2506.08593v1
Date: Tue, 10 Jun 2025 09:02:55 GMT
Title: Hateful Person or Hateful Model? Investigating the Role of Personas in Hate Speech Detection by Large Language Models
Authors: Shuzhou Yuan, Ercong Nie, Mario Tawfelis, Helmut Schmid, Hinrich Schütze, Michael Färber,
Abstract summary: We present the first comprehensive study on the role of persona prompts in hate speech classification.<n>A human annotation survey confirms that MBTI dimensions significantly affect labeling behavior.<n>Our analysis uncovers substantial persona-driven variation, including inconsistencies with ground truth, inter-persona disagreement, and logit-level biases.
Score: 47.110656690979695
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hate speech detection is a socially sensitive and inherently subjective task, with judgments often varying based on personal traits. While prior work has examined how socio-demographic factors influence annotation, the impact of personality traits on Large Language Models (LLMs) remains largely unexplored. In this paper, we present the first comprehensive study on the role of persona prompts in hate speech classification, focusing on MBTI-based traits. A human annotation survey confirms that MBTI dimensions significantly affect labeling behavior. Extending this to LLMs, we prompt four open-source models with MBTI personas and evaluate their outputs across three hate speech datasets. Our analysis uncovers substantial persona-driven variation, including inconsistencies with ground truth, inter-persona disagreement, and logit-level biases. These findings highlight the need to carefully define persona prompts in LLM-based annotation workflows, with implications for fairness and alignment with human values.

Related papers

Personalisation or Prejudice? Addressing Geographic Bias in Hate Speech Detection using Debias Tuning in Large Language Models [2.1656586298989793]
Commercial Large Language Models (LLMs) have recently incorporated memory features to deliver personalised responses.<n>This paper examines different state-of-the-art LLMs to understand their behaviour in different personalisation scenarios.<n>We prompt the models to assume country-specific personas and use different languages for hate speech detection.<n>Our findings reveal that context personalisation significantly influences LLMs' responses in this sensitive area.
arXiv Detail & Related papers (2025-05-04T21:22:20Z)
Human and LLM Biases in Hate Speech Annotations: A Socio-Demographic Analysis of Annotators and Targets [0.6918368994425961]
We leverage an extensive dataset with rich socio-demographic information of both annotators and targets.<n>Our analysis surfaces the presence of widespread biases, which we quantitatively describe and characterize based on their intensity and prevalence.<n>Our work offers new and nuanced results on human biases in hate speech annotations, as well as fresh insights into the design of AI-driven hate speech detection systems.
arXiv Detail & Related papers (2024-10-10T14:48:57Z)
Secret Keepers: The Impact of LLMs on Linguistic Markers of Personal Traits [6.886654996060662]
We investigate the impact of Large Language Models (LLMs) on the linguistic markers of demographic and psychological traits. Our findings indicate that although the use of LLMs slightly reduces the predictive power of linguistic patterns over authors' personal traits, the significant changes are infrequent.
arXiv Detail & Related papers (2024-03-30T06:49:17Z)
LLMvsSmall Model? Large Language Model Based Text Augmentation Enhanced Personality Detection Model [58.887561071010985]
Personality detection aims to detect one's personality traits underlying in social media posts. Most existing methods learn post features directly by fine-tuning the pre-trained language models. We propose a large language model (LLM) based text augmentation enhanced personality detection model.
arXiv Detail & Related papers (2024-03-12T12:10:18Z)
PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for Personality Detection [50.66968526809069]
We propose a novel personality detection method, called PsyCoT, which mimics the way individuals complete psychological questionnaires in a multi-turn dialogue manner. Our experiments demonstrate that PsyCoT significantly improves the performance and robustness of GPT-3.5 in personality detection.
arXiv Detail & Related papers (2023-10-31T08:23:33Z)
Editing Personality for Large Language Models [73.59001811199823]
This paper introduces an innovative task focused on editing the personality traits of Large Language Models (LLMs) We construct PersonalityEdit, a new benchmark dataset to address this task.
arXiv Detail & Related papers (2023-10-03T16:02:36Z)
Sensitivity, Performance, Robustness: Deconstructing the Effect of Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give. We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z)
PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits [30.770525830385637]
We study the behavior of large language models (LLMs) based on the Big Five personality model. Results show that LLM personas' self-reported BFI scores are consistent with their designated personality types. Human evaluation shows that humans can perceive some personality traits with an accuracy of up to 80%.
arXiv Detail & Related papers (2023-05-04T04:58:00Z)
Can ChatGPT Assess Human Personalities? A General Evaluation Framework [70.90142717649785]
Large Language Models (LLMs) have produced impressive results in various areas, but their potential human-like psychology is still largely unexplored. This paper presents a generic evaluation framework for LLMs to assess human personalities based on Myers Briggs Type Indicator (MBTI) tests.
arXiv Detail & Related papers (2023-03-01T06:16:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.