Reproducibility Study of Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation
- URL: http://arxiv.org/abs/2502.16242v1
- Date: Sat, 22 Feb 2025 14:28:49 GMT
- Title: Reproducibility Study of Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation
- Authors: Jose L. Garcia, Karolina Hajkova, Maria Marchenko, Carlos Miguel PatiƱo,
- Abstract summary: We validate the original findings using a range of open-weight models.<n>We propose a communication-free baseline to test whether successful negotiations are possible without agent interaction.<n>This work also provides insights into the accessibility, fairness, environmental impact, and privacy considerations of LLM-based negotiation systems.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a reproducibility study and extension of "Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation." We validate the original findings using a range of open-weight models (1.5B-70B parameters) and GPT-4o Mini while introducing several novel contributions. We analyze the Pareto front of the games, propose a communication-free baseline to test whether successful negotiations are possible without agent interaction, evaluate recent small language models' performance, analyze structural information leakage in model responses, and implement an inequality metric to assess negotiation fairness. Our results demonstrate that smaller models (<10B parameters) struggle with format adherence and coherent responses, but larger open-weight models can approach proprietary model performance. Additionally, in many scenarios, single-agent approaches can achieve comparable results to multi-agent negotiations, challenging assumptions about the necessity of agent communication to perform well on the benchmark. This work also provides insights into the accessibility, fairness, environmental impact, and privacy considerations of LLM-based negotiation systems.
Related papers
- LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.<n>We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.<n>Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z) - A Statistical Framework for Ranking LLM-Based Chatbots [57.59268154690763]
We propose a statistical framework that incorporates key advancements to address specific challenges in pairwise comparison analysis.<n>First, we introduce a factored tie model that enhances the ability to handle groupings of human-judged comparisons.<n>Second, we extend the framework to model covariance tiers between competitors, enabling deeper insights into performance relationships.<n>Third, we resolve optimization challenges arising from parameter non-uniqueness by introducing novel constraints.
arXiv Detail & Related papers (2024-12-24T12:54:19Z) - A Debate-Driven Experiment on LLM Hallucinations and Accuracy [7.821303946741665]
This study investigates the phenomenon of hallucination in large language models (LLMs)
Multiple instances of GPT-4o-Mini models engage in a debate-like interaction prompted with questions from the TruthfulQA dataset.
One model is deliberately instructed to generate plausible but false answers while the other models are asked to respond truthfully.
arXiv Detail & Related papers (2024-10-25T11:41:27Z) - MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via Debate [24.92465108034783]
Large Language Models (LLMs) have shown exceptional results on current benchmarks when working individually.
The advancement in their capabilities, along with a reduction in parameter size and inference times, has facilitated the use of these models as agents.
We evaluate the behavior of a network of models collaborating through debate under the influence of an adversary.
arXiv Detail & Related papers (2024-06-20T20:09:37Z) - Are LLMs Effective Negotiators? Systematic Evaluation of the Multifaceted Capabilities of LLMs in Negotiation Dialogues [4.738985706520995]
This work aims to systematically analyze the multifaceted capabilities of LLMs across diverse dialogue scenarios.
Our analysis highlights GPT-4's superior performance in many tasks while identifying specific challenges.
arXiv Detail & Related papers (2024-02-21T06:11:03Z) - Assistive Large Language Model Agents for Socially-Aware Negotiation Dialogues [47.977032883078664]
We develop assistive agents based on Large Language Models (LLMs) that aid interlocutors in business negotiations.<n>A third LLM acts as a remediator agent to rewrite utterances violating norms for improving negotiation outcomes.<n>We provide rich empirical evidence to demonstrate its effectiveness in negotiations across three different negotiation topics.
arXiv Detail & Related papers (2024-01-29T09:07:40Z) - AntEval: Evaluation of Social Interaction Competencies in LLM-Driven
Agents [65.16893197330589]
Large Language Models (LLMs) have demonstrated their ability to replicate human behaviors across a wide range of scenarios.
However, their capability in handling complex, multi-character social interactions has yet to be fully explored.
We introduce the Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods.
arXiv Detail & Related papers (2024-01-12T11:18:00Z) - Evaluating Language Model Agency through Negotiations [39.87262815823634]
Negotiation games enable us to study multi-turn, and cross-model interactions, modulate complexity, and side-step accidental evaluation data leakage.
We use our approach to test six widely used and publicly accessible LMs, evaluating performance and alignment in both self-play and cross-play settings.
arXiv Detail & Related papers (2024-01-09T13:19:37Z) - Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z) - Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation [52.930183136111864]
We propose using scorable negotiation to evaluate Large Language Models (LLMs)
To reach an agreement, agents must have strong arithmetic, inference, exploration, and planning capabilities.
We provide procedures to create new games and increase games' difficulty to have an evolving benchmark.
arXiv Detail & Related papers (2023-09-29T13:33:06Z) - Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise
Rollouts [52.844741540236285]
This paper investigates the model-based methods in multi-agent reinforcement learning (MARL)
We propose a novel decentralized model-based MARL method, named Adaptive Opponent-wise Rollout Policy (AORPO)
arXiv Detail & Related papers (2021-05-07T16:20:22Z) - Learnable Strategies for Bilateral Agent Negotiation over Multiple
Issues [6.12762193927784]
We present a novel bilateral negotiation model that allows a self-interested agent to learn how to negotiate over multiple issues.
The model relies upon interpretable strategy templates representing the tactics the agent should employ during the negotiation.
It learns template parameters to maximize the average utility received over multiple negotiations, thus resulting in optimal bid acceptance and generation.
arXiv Detail & Related papers (2020-09-17T13:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.