Related papers: Psychological Steering in LLMs: An Evaluation of Effectiveness and Trustworthiness

Psychological Steering in LLMs: An Evaluation of Effectiveness and Trustworthiness

URL: http://arxiv.org/abs/2510.04484v1
Date: Mon, 06 Oct 2025 04:49:56 GMT
Title: Psychological Steering in LLMs: An Evaluation of Effectiveness and Trustworthiness
Authors: Amin Banayeeanzade, Ala N. Tak, Fatemeh Bahrani, Anahita Bolourani, Leonardo Blas, Emilio Ferrara, Jonathan Gratch, Sai Praneeth Karimireddy,
Abstract summary: Our study spans four models from different LLM families paired with various steering strategies, including prompting, fine-tuning, and representation engineering.<n>Our results indicate that prompting is consistently effective but limited in intensity control, whereas vector injections achieve finer controllability while slightly reducing output quality.<n>Our framework establishes the first holistic evaluation of emotion and personality steering, offering insights into its interpretability and reliability for socially interactive applications.
Score: 14.523351279184356
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The ability to control LLMs' emulated emotional states and personality traits is essential for enabling rich, human-centered interactions in socially interactive settings. We introduce PsySET, a Psychologically-informed benchmark to evaluate LLM Steering Effectiveness and Trustworthiness across the emotion and personality domains. Our study spans four models from different LLM families paired with various steering strategies, including prompting, fine-tuning, and representation engineering. Our results indicate that prompting is consistently effective but limited in intensity control, whereas vector injections achieve finer controllability while slightly reducing output quality. Moreover, we explore the trustworthiness of steered LLMs by assessing safety, truthfulness, fairness, and ethics, highlighting potential side effects and behavioral shifts. Notably, we observe idiosyncratic effects; for instance, even a positive emotion like joy can degrade robustness to adversarial factuality, lower privacy awareness, and increase preferential bias. Meanwhile, anger predictably elevates toxicity yet strengthens leakage resistance. Our framework establishes the first holistic evaluation of emotion and personality steering, offering insights into its interpretability and reliability for socially interactive applications.

Related papers

MindShift: Analyzing Language Models' Reactions to Psychological Prompts [6.696296750931842]
Large language models (LLMs) hold the potential to absorb and reflect personality traits and attitudes specified by users.<n>Our study introduces MindShift, a benchmark for evaluating LLMs' psychological adaptability.
arXiv Detail & Related papers (2025-12-09T21:56:54Z)
Psychometric Personality Shaping Modulates Capabilities and Safety in Language Models [3.9481669393262675]
We investigate how psychometric personality control grounded in the Big Five framework influences AI behavior in the context of capability and safety benchmarks.<n>Our experiments reveal striking effects: for example, reducing conscientiousness leads to significant drops in safety-relevant metrics on benchmarks such as WMDP, TruthfulQA, ETHICS, and Sycophancy.<n>These findings highlight personality shaping as a powerful and underexplored axis of model control that interacts with both safety and general competence.
arXiv Detail & Related papers (2025-09-19T18:19:56Z)
The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs [60.15472325639723]
Personality traits have long been studied as predictors of human behavior.<n>Recent advances in Large Language Models (LLMs) suggest similar patterns may emerge in artificial systems.
arXiv Detail & Related papers (2025-09-03T21:27:10Z)
IROTE: Human-like Traits Elicitation of Large Language Model via In-Context Self-Reflective Optimization [66.6349183886101]
We propose IROTE, a novel in-context method for stable and transferable trait elicitation.<n>We show that one single IROTE-generated self-reflection can induce LLMs' stable impersonation of the target trait across diverse downstream tasks.
arXiv Detail & Related papers (2025-08-12T08:04:28Z)
Investigating the Impact of LLM Personality on Cognitive Bias Manifestation in Automated Decision-Making Tasks [4.65004369765875]
Personality traits play a crucial role in either amplifying or reducing biases.<n>Conscientiousness and Agreeableness may generally enhance the efficacy of bias mitigation strategies.
arXiv Detail & Related papers (2025-02-20T03:15:54Z)
Exploring the Impact of Personality Traits on LLM Bias and Toxicity [35.98654647219457]
"Personification" of large language models (LLMs) with different personalities has attracted increasing research interests.<n>This study explores how assigning different personality traits to LLMs affects the toxicity and biases of their outputs.
arXiv Detail & Related papers (2025-02-18T06:07:09Z)
From Personas to Talks: Revisiting the Impact of Personas on LLM-Synthesized Emotional Support Conversations [34.426199139914615]
Large Language Models (LLMs) have revolutionized the generation of emotional support conversations.<n>This paper explores the role of personas in the creation of emotional support conversations.
arXiv Detail & Related papers (2025-02-17T05:24:30Z)
Persuasion with Large Language Models: a Survey [49.86930318312291]
Large Language Models (LLMs) have created new disruptive possibilities for persuasive communication. In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness. Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks.
arXiv Detail & Related papers (2024-11-11T10:05:52Z)
Evaluating Large Language Models with Psychometrics [59.821829073478376]
This paper offers a comprehensive benchmark for quantifying psychological constructs of Large Language Models (LLMs)<n>Our work identifies five key psychological constructs -- personality, values, emotional intelligence, theory of mind, and self-efficacy -- assessed through a suite of 13 datasets.<n>We uncover significant discrepancies between LLMs' self-reported traits and their response patterns in real-world scenarios, revealing complexities in their behaviors.
arXiv Detail & Related papers (2024-06-25T16:09:08Z)
PsychoGAT: A Novel Psychological Measurement Paradigm through Interactive Fiction Games with LLM Agents [68.50571379012621]
Psychological measurement is essential for mental health, self-understanding, and personal development. PsychoGAT (Psychological Game AgenTs) achieves statistically significant excellence in psychometric metrics such as reliability, convergent validity, and discriminant validity.
arXiv Detail & Related papers (2024-02-19T18:00:30Z)
Large Language Models Understand and Can be Enhanced by Emotional Stimuli [53.53886609012119]
We take the first step towards exploring the ability of Large Language Models to understand emotional stimuli. Our experiments show that LLMs have a grasp of emotional intelligence, and their performance can be improved with emotional prompts. Our human study results demonstrate that EmotionPrompt significantly boosts the performance of generative tasks.
arXiv Detail & Related papers (2023-07-14T00:57:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.