Related papers: Evaluating Behavioral Alignment in Conflict Dialogue: A Multi-Dimensional Comparison of LLM Agents and Humans

Evaluating Behavioral Alignment in Conflict Dialogue: A Multi-Dimensional Comparison of LLM Agents and Humans

URL: http://arxiv.org/abs/2509.16394v1
Date: Fri, 19 Sep 2025 20:15:52 GMT
Title: Evaluating Behavioral Alignment in Conflict Dialogue: A Multi-Dimensional Comparison of LLM Agents and Humans
Authors: Deuksin Kwon, Kaleen Shrestha, Bin Han, Elena Hayoung Lee, Gale Lucas,
Abstract summary: Large Language Models (LLMs) are increasingly deployed in socially complex, interaction-driven tasks.<n>This study assesses the behavioral alignment of personality-prompted LLMs in adversarial dispute resolution.
Score: 3.0760465083020345
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are increasingly deployed in socially complex, interaction-driven tasks, yet their ability to mirror human behavior in emotionally and strategically complex contexts remains underexplored. This study assesses the behavioral alignment of personality-prompted LLMs in adversarial dispute resolution by simulating multi-turn conflict dialogues that incorporate negotiation. Each LLM is guided by a matched Five-Factor personality profile to control for individual variation and enhance realism. We evaluate alignment across three dimensions: linguistic style, emotional expression (e.g., anger dynamics), and strategic behavior. GPT-4.1 achieves the closest alignment with humans in linguistic style and emotional dynamics, while Claude-3.7-Sonnet best reflects strategic behavior. Nonetheless, substantial alignment gaps persist. Our findings establish a benchmark for alignment between LLMs and humans in socially complex interactions, underscoring both the promise and the limitations of personality conditioning in dialogue modeling.

Related papers

Can LLMs Truly Embody Human Personality? Analyzing AI and Human Behavior Alignment in Dispute Resolution [7.599497643290519]
Large language models (LLMs) are increasingly used to simulate human behavior in social settings.<n>It remains unclear whether these simulations reproduce the personality-behavior patterns observed in humans.
arXiv Detail & Related papers (2026-02-07T07:20:24Z)
Personality Expression Across Contexts: Linguistic and Behavioral Variation in LLM Agents [6.123697959900301]
This study examines how identical personality prompts lead to distinct linguistic, behavioral, and emotional outcomes across four conversational settings.<n>Findings suggest that the same traits are expressed differently depending on social and affective demands.
arXiv Detail & Related papers (2026-02-01T07:14:00Z)
GameTalk: Training LLMs for Strategic Conversation [51.29670609281524]
We introduce textbfGameTalk, a framework for training LLMs to make strategic decisions via multi-turn interactions.<n>Unlike prior work that focuses on single-turn objectives or static action prediction, we train LLMs to optimize a global objective across full conversations.<n>We evaluate this approach on a suite of increasingly complex games, designed to stress different aspects of reasoning, coordination, and opponent modeling.
arXiv Detail & Related papers (2026-01-22T19:18:39Z)
Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning [52.07170679746533]
Large Language Models (LLMs) are increasingly used to simulate human users in interactive settings such as therapy, education, and social role-play.<n>We introduce a unified framework for evaluating and improving persona consistency in LLM-generated dialogue.<n>We define three automatic metrics: prompt-to-line consistency, line-to-line consistency, and Q&A consistency, that capture different types of persona drift and validate each against human annotations.
arXiv Detail & Related papers (2025-10-31T19:40:41Z)
TwinVoice: A Multi-dimensional Benchmark Towards Digital Twins via LLM Persona Simulation [55.55404595177229]
Large Language Models (LLMs) are exhibiting emergent human-like abilities.<n>TwinVoice is a benchmark for assessing persona simulation across diverse real-world contexts.
arXiv Detail & Related papers (2025-10-29T14:00:42Z)
Social Simulations with Large Language Model Risk Utopian Illusion [61.358959720048354]
We introduce a systematic framework for analyzing large language models' behavior in social simulation.<n>Our approach simulates multi-agent interactions through chatroom-style conversations and analyzes them across five linguistic dimensions.<n>Our findings reveal that LLMs do not faithfully reproduce genuine human behavior but instead reflect overly idealized versions of it.
arXiv Detail & Related papers (2025-10-24T06:08:41Z)
How large language models judge and influence human cooperation [82.07571393247476]
We assess how state-of-the-art language models judge cooperative actions.<n>We observe a remarkable agreement in evaluating cooperation against good opponents.<n>We show that the differences revealed between models can significantly impact the prevalence of cooperation.
arXiv Detail & Related papers (2025-06-30T09:14:42Z)
Can LLM Agents Maintain a Persona in Discourse? [3.286711575862228]
Large Language Models (LLMs) are widely used as conversational agents, exploiting their capabilities in various sectors such as education, law, medicine, and more.<n>LLMs are often subjected to context-shifting behaviour, resulting in a lack of consistent and interpretable personality-aligned interactions.<n>We show that while LLMs can be guided toward personality-driven dialogue, their ability to maintain personality traits varies significantly depending on the combination of models and discourse settings.
arXiv Detail & Related papers (2025-02-17T14:36:39Z)
PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, a framework for better data construction and model tuning.<n>For insufficient data usage, we incorporate strategies such as Chain-of-Thought prompting and anti-induction.<n>For rigid behavior patterns, we design the tuning process and introduce automated DPO to enhance the specificity and dynamism of the models' personalities.
arXiv Detail & Related papers (2024-07-17T08:13:22Z)
The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games [9.82711167146543]
We introduce a novel methodology to study the decision-making of Large Language Models (LLMs) We show that emotions profoundly impact the performance of LLMs, leading to the development of more optimal strategies. Surprisingly, emotional prompting, particularly with anger' emotion, can disrupt the "superhuman" alignment of GPT-4.
arXiv Detail & Related papers (2024-06-05T14:08:54Z)
Human vs. Machine: Behavioral Differences Between Expert Humans and Language Models in Wargame Simulations [1.6108153271585284]
We show that large language models (LLMs) behave differently compared to humans in high-stakes military decision-making scenarios. Our results motivate policymakers to be cautious before granting autonomy or following AI-based strategy recommendations.
arXiv Detail & Related papers (2024-03-06T02:23:32Z)
LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models [4.706971067968811]
We create a two-group population of large language models (LLMs) agents using a simple variability-inducing sampling algorithm. We administer personality tests and submit the agents to a collaborative writing task, finding that different profiles exhibit different degrees of personality consistency and linguistic alignment to their conversational partners.
arXiv Detail & Related papers (2024-02-05T11:05:20Z)
Revisiting the Reliability of Psychological Scales on Large Language Models [62.57981196992073]
This study aims to determine the reliability of applying personality assessments to Large Language Models. Analysis of 2,500 settings per model, including GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1, reveals that various LLMs show consistency in responses to the Big Five Inventory.
arXiv Detail & Related papers (2023-05-31T15:03:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.