Related papers: Effects of personality steering on cooperative behavior in Large Language Model agents

Effects of personality steering on cooperative behavior in Large Language Model agents

URL: http://arxiv.org/abs/2601.05302v2
Date: Wed, 14 Jan 2026 12:54:25 GMT
Title: Effects of personality steering on cooperative behavior in Large Language Model agents
Authors: Mizuki Sakai, Mizuki Yokoyama, Wakaba Tateishi, Genki Ichinose,
Abstract summary: We study the effects of personality steering on cooperative behavior in large language models (LLMs) using Prisoner's Dilemma games.<n>Our results show that agreeableness is the dominant factor promoting cooperation across all models.<n>Explicit personality information increases cooperation but can also raise vulnerability to exploitation.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Large language models (LLMs) are increasingly used as autonomous agents in strategic and social interactions. Although recent studies suggest that assigning personality traits to LLMs can influence their behavior, how personality steering affects cooperation under controlled conditions remains unclear. In this study, we examine the effects of personality steering on cooperative behavior in LLM agents using repeated Prisoner's Dilemma games. Based on the Big Five framework, we first measure basic personality scores of three models, GPT-3.5-turbo, GPT-4o, and GPT-5, using the Big Five Inventory. We then compare behavior under baseline and personality-informed conditions, and further analyze the effects of independently manipulating each personality dimension to extreme values. Our results show that agreeableness is the dominant factor promoting cooperation across all models, while other personality traits have limited impact. Explicit personality information increases cooperation but can also raise vulnerability to exploitation, particularly in earlier-generation models. In contrast, later-generation models exhibit more selective cooperation. These findings indicate that personality steering acts as a behavioral bias rather than a deterministic control mechanism.

Related papers

Vibe Check: Understanding the Effects of LLM-Based Conversational Agents' Personality and Alignment on User Perceptions in Goal-Oriented Tasks [2.1117030125341385]
Large language models (LLMs) enable conversational agents (CAs) to express distinctive personalities.<n>This study investigates how personality expression levels and user-agent personality alignment influence perceptions in goal-oriented tasks.
arXiv Detail & Related papers (2025-09-11T21:43:49Z)
The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs [60.15472325639723]
Personality traits have long been studied as predictors of human behavior.<n>Recent advances in Large Language Models (LLMs) suggest similar patterns may emerge in artificial systems.
arXiv Detail & Related papers (2025-09-03T21:27:10Z)
SAC: A Framework for Measuring and Inducing Personality Traits in LLMs with Dynamic Intensity Control [1.9282110216621835]
Large language models (LLMs) have gained significant traction across a wide range of fields in recent years.<n>There is also a growing expectation for them to display human-like personalities during interactions.<n>Most existing models face two major limitations: they rely on the Big Five (OCEAN) framework, which only provides coarse personality dimensions, and they lack mechanisms for controlling trait intensity.
arXiv Detail & Related papers (2025-06-26T04:12:15Z)
Identifying Cooperative Personalities in Multi-agent Contexts through Personality Steering with Representation Engineering [0.0]
Large Language Models (LLMs) gain autonomous capabilities, their coordination in multi-agent settings becomes increasingly important.<n>Inspired by Axelrod's Iterated Prisoner's Dilemma (IPD) tournaments, we explore how personality traits influence LLM cooperation.<n>Using representation engineering, we steer Big Five traits (e.g., Agreeableness, Conscientiousness) in LLMs and analyze their impact on IPD decision-making.
arXiv Detail & Related papers (2025-03-17T01:21:54Z)
Evaluating Personality Traits in Large Language Models: Insights from Psychological Questionnaires [3.6001840369062386]
This work applies psychological tools to Large Language Models in diverse scenarios to generate personality profiles.<n>Our findings reveal that LLMs exhibit unique traits, varying characteristics, and distinct personality profiles even within the same family of models.
arXiv Detail & Related papers (2025-02-07T16:12:52Z)
P-React: Synthesizing Topic-Adaptive Reactions of Personality Traits via Mixture of Specialized LoRA Experts [34.374681921626205]
We propose P-React, a mixture of experts (MoE)-based personalized large language models.<n> Particularly, we integrate a Personality Loss (PSL) to better capture individual trait expressions.<n>To facilitate research in this field, we curate OCEAN-Chat, a high-quality, human-verified dataset.
arXiv Detail & Related papers (2024-06-18T12:25:13Z)
LLMs Simulate Big Five Personality Traits: Further Evidence [51.13560635563004]
We analyze the personality traits simulated by Llama2, GPT4, and Mixtral. This contributes to the broader understanding of the capabilities of LLMs to simulate personality traits.
arXiv Detail & Related papers (2024-01-31T13:45:25Z)
Editing Personality for Large Language Models [73.59001811199823]
This paper introduces an innovative task focused on editing the personality traits of Large Language Models (LLMs) We construct PersonalityEdit, a new benchmark dataset to address this task.
arXiv Detail & Related papers (2023-10-03T16:02:36Z)
Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias [57.42417061979399]
Recent studies show that instruction tuning (IT) and reinforcement learning from human feedback (RLHF) improve the abilities of large language models (LMs) dramatically. In this work, we investigate the effect of IT and RLHF on decision making and reasoning in LMs. Our findings highlight the presence of these biases in various models from the GPT-3, Mistral, and T5 families.
arXiv Detail & Related papers (2023-08-01T01:39:25Z)
Revisiting the Reliability of Psychological Scales on Large Language Models [62.57981196992073]
This study aims to determine the reliability of applying personality assessments to Large Language Models. Analysis of 2,500 settings per model, including GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1, reveals that various LLMs show consistency in responses to the Big Five Inventory.
arXiv Detail & Related papers (2023-05-31T15:03:28Z)
Evaluating and Inducing Personality in Pre-trained Language Models [78.19379997967191]
We draw inspiration from psychometric studies by leveraging human personality theory as a tool for studying machine behaviors. To answer these questions, we introduce the Machine Personality Inventory (MPI) tool for studying machine behaviors. MPI follows standardized personality tests, built upon the Big Five Personality Factors (Big Five) theory and personality assessment inventories. We devise a Personality Prompting (P2) method to induce LLMs with specific personalities in a controllable way.
arXiv Detail & Related papers (2022-05-20T07:32:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.