PieArena: Frontier Language Agents Achieve MBA-Level Negotiation Performance and Reveal Novel Behavioral Differences
- URL: http://arxiv.org/abs/2602.05302v2
- Date: Wed, 11 Feb 2026 05:36:24 GMT
- Title: PieArena: Frontier Language Agents Achieve MBA-Level Negotiation Performance and Reveal Novel Behavioral Differences
- Authors: Chris Zhu, Sasha Cui, Will Sanok Dufallo, Runzhi Jin, Zhen Xu, Linjun Zhang, Daylian Cain,
- Abstract summary: We introduce PieArena, a large-scale negotiation benchmark grounded in multi-agent interactions.<n>We find systematic evidence of human-expert-level performance in which a representative frontier language agent (GPT-5) matches or outperforms trained business-school students.
- Score: 13.759960839511807
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an in-depth evaluation of LLMs' ability to negotiate, a central business task that requires strategic reasoning, theory of mind, and economic value creation. To do so, we introduce PieArena, a large-scale negotiation benchmark grounded in multi-agent interactions over realistic scenarios drawn from an MBA negotiation course at an elite business school. We develop a statistically grounded ranking model for continuous negotiation payoffs that produces leaderboards with principled confidence intervals and corrects for experimental asymmetries. We find systematic evidence of human-expert-level performance in which a representative frontier language agent (GPT-5) matches or outperforms trained business-school students, despite a semester of general negotiation instruction and targeted coaching immediately prior to the task. We further study the effects of joint-intentionality agentic scaffolding and observe asymmetric gains, with large improvements for mid- and lower-tier LMs and diminishing returns for frontier LMs. Beyond deal outcomes, PieArena provides a multi-dimensional negotiation behavioral profile, revealing novel cross-model heterogeneity, masked by deal-outcome-only benchmarks, in deception, computation accuracy, instruction compliance, and perceived reputation. Overall, our results suggest that frontier language agents are already intellectually and psychologically capable of deployment in high-stakes economic settings, but deficiencies in robustness and trustworthiness remain open challenges.
Related papers
- MERIT Feedback Elicits Better Bargaining in LLM Negotiators [38.1466669265123]
AgoraBench is a new benchmark spanning nine challenging settings.<n>This is operationalized via agent utility, negotiation power, and acquisition ratio that implicitly measure how well the negotiation aligns with human preference.<n>Our mechanism substantially improves negotiation performance, yielding deeper strategic behavior and stronger opponent awareness.
arXiv Detail & Related papers (2026-02-11T03:09:45Z) - How Far Can LLMs Emulate Human Behavior?: A Strategic Analysis via the Buy-and-Sell Negotiation Game [0.8353024005684598]
This work proposes a methodology to quantitatively evaluate the human emotional and behavioral imitation and strategic decision-making capabilities of Large Language Models (LLMs)<n>Specifically, we assign different personas to multiple LLMs and conduct negotiations between a Buyer and a Seller, comprehensively analyzing outcomes such as win rates, transaction prices, and SHAP values.<n>Our experimental results show that models with higher existing benchmark scores tend to achieve better negotiation performance overall.
arXiv Detail & Related papers (2025-11-22T09:07:29Z) - EQ-Negotiator: Dynamic Emotional Personas Empower Small Language Models for Edge-Deployable Credit Negotiation [66.09161596959771]
Small language models (SLMs) offer a practical alternative, but suffer from a significant performance gap compared to large language models (LLMs)<n>This paper introduces EQ-Negotiator, a novel framework that bridges this capability gap using emotional personas.<n>We show that a 7B parameter language model with EQ-Negotiator achieves better debt recovery and negotiation efficiency than baseline LLMs more than 10 times its size.
arXiv Detail & Related papers (2025-11-05T11:25:07Z) - Strategic Tradeoffs Between Humans and AI in Multi-Agent Bargaining [6.455342700410145]
We compare outcomes and behavioral dynamics across humans, large language models, and Bayesian agents in a dynamic negotiation setting.<n>We find that performance parity can conceal fundamental differences in process and alignment.<n>This work provides a baseline for future studies in more applied, variable-rich environments.
arXiv Detail & Related papers (2025-09-11T00:25:07Z) - CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions [85.88573535033406]
CRMArena-Pro is a novel benchmark for holistic, realistic assessment of LLM agents in diverse professional settings.<n>It incorporates multi-turn interactions guided by diverse personas and robust confidentiality awareness assessments.<n>Experiments reveal leading LLM agents achieve only around 58% single-turn success on CRMArena-Pro, with performance dropping significantly to approximately 35% in multi-turn settings.
arXiv Detail & Related papers (2025-05-24T21:33:22Z) - EmoDebt: Bayesian-Optimized Emotional Intelligence for Strategic Agent-to-Agent Debt Recovery [65.30120701878582]
Large Language Model (LLM) agents are vulnerable to exploitation in emotion-sensitive domains like debt collection.<n>EmoDebt is an emotional intelligence engine that reframes a model's ability to express emotion in negotiation as a sequential decision-making problem.<n>EmoDebt achieves significant strategic robustness, substantially outperforming non-adaptive and emotion-agnostic baselines.
arXiv Detail & Related papers (2025-03-27T01:41:34Z) - Reproducibility Study of Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation [0.0]
We validate the original findings using a range of open-weight models.<n>We propose a communication-free baseline to test whether successful negotiations are possible without agent interaction.<n>This work also provides insights into the accessibility, fairness, environmental impact, and privacy considerations of LLM-based negotiation systems.
arXiv Detail & Related papers (2025-02-22T14:28:49Z) - Multi-Agent Imitation Learning: Value is Easy, Regret is Hard [52.31989962031179]
We study a multi-agent imitation learning (MAIL) problem where we take the perspective of a learner attempting to coordinate a group of agents.
Most prior work in MAIL essentially reduces the problem to matching the behavior of the expert within the support of the demonstrations.
While doing so is sufficient to drive the value gap between the learner and the expert to zero under the assumption that agents are non-strategic, it does not guarantee to deviations by strategic agents.
arXiv Detail & Related papers (2024-06-06T16:18:20Z) - LLMs with Personalities in Multi-issue Negotiation Games [2.186901738997927]
We measure the ability of large language models (LLMs) to negotiate within a game-theoretical framework.
We find high openness, conscientiousness, and neuroticism are associated with fair tendencies.
Low agreeableness and low openness are associated with rational tendencies.
arXiv Detail & Related papers (2024-05-08T17:51:53Z) - Assistive Large Language Model Agents for Socially-Aware Negotiation Dialogues [47.977032883078664]
We develop assistive agents based on Large Language Models (LLMs) that aid interlocutors in business negotiations.<n>A third LLM acts as a remediator agent to rewrite utterances violating norms for improving negotiation outcomes.<n>We provide rich empirical evidence to demonstrate its effectiveness in negotiations across three different negotiation topics.
arXiv Detail & Related papers (2024-01-29T09:07:40Z) - AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents [74.16170899755281]
We introduce AgentBoard, a pioneering comprehensive benchmark and accompanied open-source evaluation framework tailored to analytical evaluation of LLM agents.<n>AgentBoard offers a fine-grained progress rate metric that captures incremental advancements as well as a comprehensive evaluation toolkit.<n>This not only sheds light on the capabilities and limitations of LLM agents but also propels the interpretability of their performance to the forefront.
arXiv Detail & Related papers (2024-01-24T01:51:00Z) - Be Selfish, But Wisely: Investigating the Impact of Agent Personality in
Mixed-Motive Human-Agent Interactions [24.266490660606497]
We find that self-play RL fails to learn the value of compromise in a negotiation.
We modify the training procedure in two novel ways to design agents with diverse personalities and analyze their performance with human partners.
We find that although both techniques show promise, a selfish agent, which maximizes its own performance while also avoiding walkaways, performs superior to other variants by implicitly learning to generate value for both itself and the negotiation partner.
arXiv Detail & Related papers (2023-10-22T20:31:35Z) - MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning [62.065503126104126]
We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes.
This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people.
We introduce MERMAIDE, a model-based meta-learning framework to train a principal that can quickly adapt to out-of-distribution agents.
arXiv Detail & Related papers (2023-04-10T15:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.