Related papers: Exploring the Impact of Personality Traits on LLM Bias and Toxicity

Exploring the Impact of Personality Traits on LLM Bias and Toxicity

URL: http://arxiv.org/abs/2502.12566v3
Date: Thu, 18 Sep 2025 01:59:34 GMT
Title: Exploring the Impact of Personality Traits on LLM Bias and Toxicity
Authors: Shuo Wang, Renhao Li, Xi Chen, Yulin Yuan, Derek F. Wong, Min Yang,
Abstract summary: "Personification" of large language models (LLMs) with different personalities has attracted increasing research interests.<n>This study explores how assigning different personality traits to LLMs affects the toxicity and biases of their outputs.
Score: 35.98654647219457
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the different roles that AI is expected to play in human life, imbuing large language models (LLMs) with different personalities has attracted increasing research interests. While the "personification" enhances human experiences of interactivity and adaptability of LLMs, it gives rise to critical concerns about content safety, particularly regarding bias, sentiment and toxicity of LLM generation. This study explores how assigning different personality traits to LLMs affects the toxicity and biases of their outputs. Leveraging the widely accepted HEXACO personality framework developed in social psychology, we design experimentally sound prompts to test three LLMs' performance on three toxic and bias benchmarks. The findings demonstrate the sensitivity of all three models to HEXACO personality traits and, more importantly, a consistent variation in the biases, negative sentiment and toxicity of their output. In particular, adjusting the levels of several personality traits can effectively reduce bias and toxicity in model performance, similar to humans' correlations between personality traits and toxic behaviors. The findings highlight the additional need to examine content safety besides the efficiency of training or fine-tuning methods for LLM personification. They also suggest a potential for the adjustment of personalities to be a simple and low-cost method to conduct controlled text generation.

Related papers

Psychological Steering in LLMs: An Evaluation of Effectiveness and Trustworthiness [14.523351279184356]
Our study spans four models from different LLM families paired with various steering strategies, including prompting, fine-tuning, and representation engineering.<n>Our results indicate that prompting is consistently effective but limited in intensity control, whereas vector injections achieve finer controllability while slightly reducing output quality.<n>Our framework establishes the first holistic evaluation of emotion and personality steering, offering insights into its interpretability and reliability for socially interactive applications.
arXiv Detail & Related papers (2025-10-06T04:49:56Z)
The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs [60.15472325639723]
Personality traits have long been studied as predictors of human behavior.<n>Recent advances in Large Language Models (LLMs) suggest similar patterns may emerge in artificial systems.
arXiv Detail & Related papers (2025-09-03T21:27:10Z)
Evaluating the Simulation of Human Personality-Driven Susceptibility to Misinformation with LLMs [0.18416014644193066]
Large language models (LLMs) make it possible to generate synthetic behavioural data at scale.<n>We evaluate the capacity of LLM agents, conditioned on Big-Five profiles, to reproduce personality-based variation in susceptibility to misinformation.
arXiv Detail & Related papers (2025-06-30T08:16:07Z)
A Comparative Study of Large Language Models and Human Personality Traits [6.354326674890978]
Large Language Models (LLMs) have demonstrated human-like capabilities in language comprehension and generation.<n>This study investigates whether LLMs exhibit personality-like traits and how these traits compare with human personality.
arXiv Detail & Related papers (2025-05-01T15:10:15Z)
Aligned Probing: Relating Toxic Behavior and Model Internals [66.49887503194101]
We introduce aligned probing, a novel interpretability framework that aligns the behavior of language models (LMs) Using this framework, we examine over 20 OLMo, Llama, and Mistral models, bridging behavioral and internal perspectives for toxicity for the first time. Our results show that LMs strongly encode information about the toxicity level of inputs and subsequent outputs, particularly in lower layers.
arXiv Detail & Related papers (2025-03-17T17:23:50Z)
Investigating the Impact of LLM Personality on Cognitive Bias Manifestation in Automated Decision-Making Tasks [4.65004369765875]
Personality traits play a crucial role in either amplifying or reducing biases. Conscientiousness and Agreeableness may generally enhance the efficacy of bias mitigation strategies.
arXiv Detail & Related papers (2025-02-20T03:15:54Z)
Persuasion with Large Language Models: a Survey [49.86930318312291]
Large Language Models (LLMs) have created new disruptive possibilities for persuasive communication. In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness. Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks.
arXiv Detail & Related papers (2024-11-11T10:05:52Z)
LMLPA: Language Model Linguistic Personality Assessment [11.599282127259736]
Large Language Models (LLMs) are increasingly used in everyday life and research. measuring the personality of a given LLM is currently a challenge. This paper introduces the Language Model Linguistic Personality Assessment (LMLPA), a system designed to evaluate the linguistic personalities of LLMs.
arXiv Detail & Related papers (2024-10-23T07:48:51Z)
Neuron-based Personality Trait Induction in Large Language Models [115.08894603023712]
Large language models (LLMs) have become increasingly proficient at simulating various personality traits. We present a neuron-based approach for personality trait induction in LLMs.
arXiv Detail & Related papers (2024-10-16T07:47:45Z)
Exploring the Personality Traits of LLMs through Latent Features Steering [12.142248881876355]
We investigate how factors, such as cultural norms and environmental stressors, encoded within large language models (LLMs) shape their personality traits.<n>We propose a training-free approach to modify the model's behavior by extracting and steering latent features corresponding to factors within the model.
arXiv Detail & Related papers (2024-10-07T21:02:34Z)
Do Large Language Models Possess Sensitive to Sentiment? [18.88126980975737]
Large Language Models (LLMs) have recently displayed their extraordinary capabilities in language understanding.<n>This paper investigates the ability of LLMs to detect and react to sentiment in text modal.
arXiv Detail & Related papers (2024-09-04T01:40:20Z)
Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance [73.19687314438133]
We study how reliance is affected by contextual features of an interaction. We find that contextual characteristics significantly affect human reliance behavior. Our results show that calibration and language quality alone are insufficient in evaluating the risks of human-LM interactions.
arXiv Detail & Related papers (2024-07-10T18:00:05Z)
Evaluating Large Language Models with Psychometrics [59.821829073478376]
This paper offers a comprehensive benchmark for quantifying psychological constructs of Large Language Models (LLMs)<n>Our work identifies five key psychological constructs -- personality, values, emotional intelligence, theory of mind, and self-efficacy -- assessed through a suite of 13 datasets.<n>We uncover significant discrepancies between LLMs' self-reported traits and their response patterns in real-world scenarios, revealing complexities in their behaviors.
arXiv Detail & Related papers (2024-06-25T16:09:08Z)
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics [29.325576963215163]
Large Language Models (LLMs) have led to their adaptation in various domains as conversational agents. We introduce TRAIT, a new benchmark consisting of 8K multi-choice questions designed to assess the personality of LLMs. LLMs exhibit distinct and consistent personality, which is highly influenced by their training data.
arXiv Detail & Related papers (2024-06-20T19:50:56Z)
Is persona enough for personality? Using ChatGPT to reconstruct an agent's latent personality from simple descriptions [2.6080756513915824]
Personality, a fundamental aspect of human cognition, contains a range of traits that influence behaviors, thoughts, and emotions. This paper explores the capabilities of large language models (LLMs) in reconstructing these complex cognitive attributes based only on simple descriptions containing socio-demographic and personality type information.
arXiv Detail & Related papers (2024-06-18T02:32:57Z)
Challenging the Validity of Personality Tests for Large Language Models [2.9123921488295768]
Large language models (LLMs) behave increasingly human-like in text-based interactions. LLMs' responses to personality tests systematically deviate from human responses.
arXiv Detail & Related papers (2023-11-09T11:54:01Z)
Personality Traits in Large Language Models [44.908741466152215]
Personality is a key factor determining the effectiveness of communication. We present a comprehensive method for administering and validating personality tests on widely-used large language models. We discuss application and ethical implications of the measurement and shaping method, in particular regarding responsible AI.
arXiv Detail & Related papers (2023-07-01T00:58:51Z)
Revisiting the Reliability of Psychological Scales on Large Language Models [62.57981196992073]
This study aims to determine the reliability of applying personality assessments to Large Language Models. Analysis of 2,500 settings per model, including GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1, reveals that various LLMs show consistency in responses to the Big Five Inventory.
arXiv Detail & Related papers (2023-05-31T15:03:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.