Related papers: How Far Can LLMs Emulate Human Behavior?: A Strategic Analysis via the Buy-and-Sell Negotiation Game

How Far Can LLMs Emulate Human Behavior?: A Strategic Analysis via the Buy-and-Sell Negotiation Game

URL: http://arxiv.org/abs/2511.17990v1
Date: Sat, 22 Nov 2025 09:07:29 GMT
Title: How Far Can LLMs Emulate Human Behavior?: A Strategic Analysis via the Buy-and-Sell Negotiation Game
Authors: Mingyu Jeon, Jaeyoung Suh, Suwan Cho, Dohyeon Kim,
Abstract summary: This work proposes a methodology to quantitatively evaluate the human emotional and behavioral imitation and strategic decision-making capabilities of Large Language Models (LLMs)<n>Specifically, we assign different personas to multiple LLMs and conduct negotiations between a Buyer and a Seller, comprehensively analyzing outcomes such as win rates, transaction prices, and SHAP values.<n>Our experimental results show that models with higher existing benchmark scores tend to achieve better negotiation performance overall.
Score: 0.8353024005684598
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the rapid advancement of Large Language Models (LLMs), recent studies have drawn attention to their potential for handling not only simple question-answer tasks but also more complex conversational abilities and performing human-like behavioral imitations. In particular, there is considerable interest in how accurately LLMs can reproduce real human emotions and behaviors, as well as whether such reproductions can function effectively in real-world scenarios. However, existing benchmarks focus primarily on knowledge-based assessment and thus fall short of sufficiently reflecting social interactions and strategic dialogue capabilities. To address these limitations, this work proposes a methodology to quantitatively evaluate the human emotional and behavioral imitation and strategic decision-making capabilities of LLMs by employing a Buy and Sell negotiation simulation. Specifically, we assign different personas to multiple LLMs and conduct negotiations between a Buyer and a Seller, comprehensively analyzing outcomes such as win rates, transaction prices, and SHAP values. Our experimental results show that models with higher existing benchmark scores tend to achieve better negotiation performance overall, although some models exhibit diminished performance in scenarios emphasizing emotional or social contexts. Moreover, competitive and cunning traits prove more advantageous for negotiation outcomes than altruistic and cooperative traits, suggesting that the assigned persona can lead to significant variations in negotiation strategies and results. Consequently, this study introduces a new evaluation approach for LLMs' social behavior imitation and dialogue strategies, and demonstrates how negotiation simulations can serve as a meaningful complementary metric to measure real-world interaction capabilities-an aspect often overlooked in existing benchmarks.

Related papers

MERIT Feedback Elicits Better Bargaining in LLM Negotiators [38.1466669265123]
AgoraBench is a new benchmark spanning nine challenging settings.<n>This is operationalized via agent utility, negotiation power, and acquisition ratio that implicitly measure how well the negotiation aligns with human preference.<n>Our mechanism substantially improves negotiation performance, yielding deeper strategic behavior and stronger opponent awareness.
arXiv Detail & Related papers (2026-02-11T03:09:45Z)
Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia [100.74015791021044]
Large Language Model (LLM) agents have demonstrated impressive capabilities for social interaction.<n>Existing evaluation methods fail to measure how well these capabilities generalize to novel social situations.<n>We present empirical results from the NeurIPS 2024 Concordia Contest, where agents were evaluated on their ability to achieve mutual gains.
arXiv Detail & Related papers (2025-12-03T00:11:05Z)
Strategic Tradeoffs Between Humans and AI in Multi-Agent Bargaining [6.455342700410145]
We compare outcomes and behavioral dynamics across humans, large language models, and Bayesian agents in a dynamic negotiation setting.<n>We find that performance parity can conceal fundamental differences in process and alignment.<n>This work provides a baseline for future studies in more applied, variable-rich environments.
arXiv Detail & Related papers (2025-09-11T00:25:07Z)
EvoEmo: Towards Evolved Emotional Policies for Adversarial LLM Agents in Multi-Turn Price Negotiation [61.627248012799704]
Existing Large Language Models (LLMs) agents largely overlook the functional role of emotions in such negotiations.<n>We present EvoEmo, an evolutionary reinforcement learning framework that optimize dynamic emotional expression in negotiations.
arXiv Detail & Related papers (2025-09-04T15:23:58Z)
How large language models judge and influence human cooperation [82.07571393247476]
We assess how state-of-the-art language models judge cooperative actions.<n>We observe a remarkable agreement in evaluating cooperation against good opponents.<n>We show that the differences revealed between models can significantly impact the prevalence of cooperation.
arXiv Detail & Related papers (2025-06-30T09:14:42Z)
Reproducibility Study of Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation [0.0]
We validate the original findings using a range of open-weight models.<n>We propose a communication-free baseline to test whether successful negotiations are possible without agent interaction.<n>This work also provides insights into the accessibility, fairness, environmental impact, and privacy considerations of LLM-based negotiation systems.
arXiv Detail & Related papers (2025-02-22T14:28:49Z)
Word Synchronization Challenge: A Benchmark for Word Association Responses for LLMs [4.352318127577628]
This paper introduces the Word Synchronization Challenge, a novel benchmark to evaluate large language models (LLMs) in Human-Computer Interaction (HCI)<n>This benchmark uses a dynamic game-like framework to test LLMs ability to mimic human cognitive processes through word associations.
arXiv Detail & Related papers (2025-02-12T11:30:28Z)
PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, a framework for better data construction and model tuning.<n>For insufficient data usage, we incorporate strategies such as Chain-of-Thought prompting and anti-induction.<n>For rigid behavior patterns, we design the tuning process and introduce automated DPO to enhance the specificity and dynamism of the models' personalities.
arXiv Detail & Related papers (2024-07-17T08:13:22Z)
Are LLMs Effective Negotiators? Systematic Evaluation of the Multifaceted Capabilities of LLMs in Negotiation Dialogues [4.738985706520995]
This work aims to systematically analyze the multifaceted capabilities of LLMs across diverse dialogue scenarios. Our analysis highlights GPT-4's superior performance in many tasks while identifying specific challenges.
arXiv Detail & Related papers (2024-02-21T06:11:03Z)
How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis [50.15061156253347]
Negotiation is the basis of social interactions; humans negotiate everything from the price of cars to how to share common resources. With rapidly growing interest in using large language models (LLMs) to act as agents on behalf of human users, such LLM agents would also need to be able to negotiate. We develop NegotiationArena: a flexible framework for evaluating and probing the negotiation abilities of LLM agents.
arXiv Detail & Related papers (2024-02-08T17:51:48Z)
AntEval: Evaluation of Social Interaction Competencies in LLM-Driven Agents [65.16893197330589]
Large Language Models (LLMs) have demonstrated their ability to replicate human behaviors across a wide range of scenarios. However, their capability in handling complex, multi-character social interactions has yet to be fully explored. We introduce the Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods.
arXiv Detail & Related papers (2024-01-12T11:18:00Z)
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate [57.71597869337909]
We build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models. Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments.
arXiv Detail & Related papers (2023-08-14T15:13:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.