Related papers: Can LLMs Truly Embody Human Personality? Analyzing AI and Human Behavior Alignment in Dispute Resolution

Can LLMs Truly Embody Human Personality? Analyzing AI and Human Behavior Alignment in Dispute Resolution

URL: http://arxiv.org/abs/2602.07414v1
Date: Sat, 07 Feb 2026 07:20:24 GMT
Title: Can LLMs Truly Embody Human Personality? Analyzing AI and Human Behavior Alignment in Dispute Resolution
Authors: Deuksin Kwon, Kaleen Shrestha, Bin Han, Spencer Lin, James Hale, Jonathan Gratch, Maja Matarić, Gale M. Lucas,
Abstract summary: Large language models (LLMs) are increasingly used to simulate human behavior in social settings.<n>It remains unclear whether these simulations reproduce the personality-behavior patterns observed in humans.
Score: 7.599497643290519
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are increasingly used to simulate human behavior in social settings such as legal mediation, negotiation, and dispute resolution. However, it remains unclear whether these simulations reproduce the personality-behavior patterns observed in humans. Human personality, for instance, shapes how individuals navigate social interactions, including strategic choices and behaviors in emotionally charged interactions. This raises the question: Can LLMs, when prompted with personality traits, reproduce personality-driven differences in human conflict behavior? To explore this, we introduce an evaluation framework that enables direct comparison of human-human and LLM-LLM behaviors in dispute resolution dialogues with respect to Big Five Inventory (BFI) personality traits. This framework provides a set of interpretable metrics related to strategic behavior and conflict outcomes. We additionally contribute a novel dataset creation methodology for LLM dispute resolution dialogues with matched scenarios and personality traits with respect to human conversations. Finally, we demonstrate the use of our evaluation framework with three contemporary closed-source LLMs and show significant divergences in how personality manifests in conflict across different LLMs compared to human data, challenging the assumption that personality-prompted agents can serve as reliable behavioral proxies in socially impactful applications. Our work highlights the need for psychological grounding and validation in AI simulations before real-world use.

Related papers

Emulating Aggregate Human Choice Behavior and Biases with GPT Conversational Agents [0.48439699124726004]
Large language models (LLMs) have been shown to reproduce well-known biases.<n>We adapted three well-established decision scenarios into a conversational setting and conducted a human experiment.<n>We found notable differences between models in how they aligned human behavior.
arXiv Detail & Related papers (2026-02-05T12:33:05Z)
HumanLLM: Towards Personalized Understanding and Simulation of Human Nature [72.55730315685837]
HumanLLM is a foundation model designed for personalized understanding and simulation of individuals.<n>We first construct the Cognitive Genome, a large-scale corpus curated from real-world user data on platforms like Reddit, Twitter, Blogger, and Amazon.<n>We then formulate diverse learning tasks and perform supervised fine-tuning to empower the model to predict a wide range of individualized human behaviors, thoughts, and experiences.
arXiv Detail & Related papers (2026-01-22T09:27:27Z)
HUMANLLM: Benchmarking and Reinforcing LLM Anthropomorphism via Human Cognitive Patterns [59.17423586203706]
We present HUMANLLM, a framework treating psychological patterns as interacting causal forces.<n>We construct 244 patterns from 12,000 academic papers and synthesize 11,359 scenarios where 2-5 patterns reinforce, conflict, or modulate each other.<n>Our dual-level checklists evaluate both individual pattern fidelity and emergent multi-pattern dynamics, achieving strong human alignment.
arXiv Detail & Related papers (2026-01-15T08:56:53Z)
TwinVoice: A Multi-dimensional Benchmark Towards Digital Twins via LLM Persona Simulation [55.55404595177229]
Large Language Models (LLMs) are exhibiting emergent human-like abilities.<n>TwinVoice is a benchmark for assessing persona simulation across diverse real-world contexts.
arXiv Detail & Related papers (2025-10-29T14:00:42Z)
Social Simulations with Large Language Model Risk Utopian Illusion [61.358959720048354]
We introduce a systematic framework for analyzing large language models' behavior in social simulation.<n>Our approach simulates multi-agent interactions through chatroom-style conversations and analyzes them across five linguistic dimensions.<n>Our findings reveal that LLMs do not faithfully reproduce genuine human behavior but instead reflect overly idealized versions of it.
arXiv Detail & Related papers (2025-10-24T06:08:41Z)
Evaluating Behavioral Alignment in Conflict Dialogue: A Multi-Dimensional Comparison of LLM Agents and Humans [3.0760465083020345]
Large Language Models (LLMs) are increasingly deployed in socially complex, interaction-driven tasks.<n>This study assesses the behavioral alignment of personality-prompted LLMs in adversarial dispute resolution.
arXiv Detail & Related papers (2025-09-19T20:15:52Z)
SocialEval: Evaluating Social Intelligence of Large Language Models [70.90981021629021]
Social Intelligence (SI) equips humans with interpersonal abilities to behave wisely in navigating social interactions to achieve social goals.<n>This presents an operational evaluation paradigm: outcome-oriented goal achievement evaluation and process-oriented interpersonal ability evaluation.<n>We propose SocialEval, a script-based bilingual SI benchmark, integrating outcome- and process-oriented evaluation by manually crafting narrative scripts.
arXiv Detail & Related papers (2025-06-01T08:36:51Z)
Measurement of LLM's Philosophies of Human Nature [113.47929131143766]
We design the standardized psychological scale specifically targeting large language models (LLM)<n>We show that current LLMs exhibit a systemic lack of trust in humans.<n>We propose a mental loop learning framework, which enables LLM to continuously optimize its value system.
arXiv Detail & Related papers (2025-04-03T06:22:19Z)
Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games [7.504095239018173]
Large Language Model (LLM)-based agents increasingly undertake real-world tasks and engage with human society.<n>We investigate how different personas and experimental framings affect these AI agents' altruistic behavior in dictator games.<n>We show that assigning a human-like identity to LLMs does not produce human-like behaviors.
arXiv Detail & Related papers (2024-10-28T17:47:41Z)
PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, a framework for better data construction and model tuning.<n>For insufficient data usage, we incorporate strategies such as Chain-of-Thought prompting and anti-induction.<n>For rigid behavior patterns, we design the tuning process and introduce automated DPO to enhance the specificity and dynamism of the models' personalities.
arXiv Detail & Related papers (2024-07-17T08:13:22Z)
The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games [9.82711167146543]
We introduce a novel methodology to study the decision-making of Large Language Models (LLMs) We show that emotions profoundly impact the performance of LLMs, leading to the development of more optimal strategies. Surprisingly, emotional prompting, particularly with anger' emotion, can disrupt the "superhuman" alignment of GPT-4.
arXiv Detail & Related papers (2024-06-05T14:08:54Z)
Human vs. Machine: Behavioral Differences Between Expert Humans and Language Models in Wargame Simulations [1.6108153271585284]
We show that large language models (LLMs) behave differently compared to humans in high-stakes military decision-making scenarios. Our results motivate policymakers to be cautious before granting autonomy or following AI-based strategy recommendations.
arXiv Detail & Related papers (2024-03-06T02:23:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.