Related papers: Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation

Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation

URL: http://arxiv.org/abs/2309.17234v2
Date: Mon, 10 Jun 2024 14:43:34 GMT
Title: Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation
Authors: Sahar Abdelnabi, Amr Gomaa, Sarath Sivaprasad, Lea Schönherr, Mario Fritz,
Abstract summary: We propose using scorable negotiation to evaluate Large Language Models (LLMs) To reach an agreement, agents must have strong arithmetic, inference, exploration, and planning capabilities. We provide procedures to create new games and increase games' difficulty to have an evolving benchmark.
Score: 52.930183136111864
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There is an growing interest in using Large Language Models (LLMs) in multi-agent systems to tackle interactive real-world tasks that require effective collaboration and assessing complex situations. Yet, we still have a limited understanding of LLMs' communication and decision-making abilities in multi-agent setups. The fundamental task of negotiation spans many key features of communication, such as cooperation, competition, and manipulation potentials. Thus, we propose using scorable negotiation to evaluate LLMs. We create a testbed of complex multi-agent, multi-issue, and semantically rich negotiation games. To reach an agreement, agents must have strong arithmetic, inference, exploration, and planning capabilities while integrating them in a dynamic and multi-turn setup. We propose multiple metrics to rigorously quantify agents' performance and alignment with the assigned role. We provide procedures to create new games and increase games' difficulty to have an evolving benchmark. Importantly, we evaluate critical safety aspects such as the interaction dynamics between agents influenced by greedy and adversarial players. Our benchmark is highly challenging; GPT-3.5 and small models mostly fail, and GPT-4 and SoTA large models (e.g., Llama-3 70b) still underperform.

Related papers

Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models [8.08979200534563]
Real-world applications demand sophisticated multi-turn interactions. Recent advancements in large language models (LLMs) have revolutionized their ability to handle single-turn tasks.
arXiv Detail & Related papers (2025-04-07T04:00:08Z)
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents [59.825725526176655]
Large Language Models (LLMs) have shown remarkable capabilities as autonomous agents. Existing benchmarks either focus on single-agent tasks or are confined to narrow domains, failing to capture the dynamics of multi-agent coordination and competition. We introduce MultiAgentBench, a benchmark designed to evaluate LLM-based multi-agent systems across diverse, interactive scenarios.
arXiv Detail & Related papers (2025-03-03T05:18:50Z)
COMMA: A Communicative Multimodal Multi-Agent Benchmark [7.831385481814481]
We introduce a novel benchmark designed to evaluate the collaborative performance of multimodal multi-agent systems through language communication. Our findings reveal surprising weaknesses in state-of-the-art models, including proprietary models like GPT-4o.
arXiv Detail & Related papers (2024-10-10T02:49:47Z)
BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems [15.159418172629701]
Large Language Models (LLMs) are becoming increasingly powerful and capable of handling complex tasks. Compared to single agents, multi-agent systems have higher requirements for the collaboration capabilities of language models. We propose a benchmark, called BattleAgentBench, which defines seven sub-stages of three varying difficulty levels.
arXiv Detail & Related papers (2024-08-28T17:43:55Z)
Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy [24.521882655442187]
Diplomacy involves a staggering magnitude of decision spaces, especially considering the negotiation stage required. Previous AI agents have demonstrated their ability to handle multi-step games and large action spaces in multi-agent tasks. We aim to explore AI's potential to create a human-like agent capable of executing comprehensive multi-agent missions.
arXiv Detail & Related papers (2024-07-09T12:37:54Z)
CoMM: Collaborative Multi-Agent, Multi-Reasoning-Path Prompting for Complex Problem Solving [9.446546965008249]
We propose a collaborative multi-agent, multi-reasoning-path (CoMM) prompting framework. Specifically, we prompt LLMs to play different roles in a problem-solving team, and encourage different role-play agents to collaboratively solve the target task. Empirical results demonstrate the effectiveness of the proposed methods on two college-level science problems.
arXiv Detail & Related papers (2024-04-26T23:29:12Z)
AntEval: Evaluation of Social Interaction Competencies in LLM-Driven Agents [65.16893197330589]
Large Language Models (LLMs) have demonstrated their ability to replicate human behaviors across a wide range of scenarios. However, their capability in handling complex, multi-character social interactions has yet to be fully explored. We introduce the Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods.
arXiv Detail & Related papers (2024-01-12T11:18:00Z)
MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration [102.41118020705876]
Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing. As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework. This work introduces a novel benchmarking framework specifically tailored to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z)
Multi-Agent Consensus Seeking via Large Language Models [6.922356864800498]
Multi-agent systems driven by large language models (LLMs) have shown promising abilities for solving complex tasks in a collaborative manner. This work considers a fundamental problem in multi-agent collaboration: consensus seeking.
arXiv Detail & Related papers (2023-10-31T03:37:11Z)
MindAgent: Emergent Gaming Interaction [103.73707345211892]
Large Language Models (LLMs) have the capacity of performing complex scheduling in a multi-agent system. We propose MindAgent to evaluate planning and coordination emergent capabilities for gaming interaction.
arXiv Detail & Related papers (2023-09-18T17:52:22Z)
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors [93.38830440346783]
We propose a multi-agent framework framework that can collaboratively adjust its composition as a greater-than-the-sum-of-its-parts system. Our experiments demonstrate that framework framework can effectively deploy multi-agent groups that outperform a single agent. In view of these behaviors, we discuss some possible strategies to leverage positive ones and mitigate negative ones for improving the collaborative potential of multi-agent groups.
arXiv Detail & Related papers (2023-08-21T16:47:11Z)
Building Cooperative Embodied Agents Modularly with Large Language Models [104.57849816689559]
We address challenging multi-agent cooperation problems with decentralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embodied environments. We harness the commonsense knowledge, reasoning ability, language comprehension, and text generation prowess of LLMs and seamlessly incorporate them into a cognitive-inspired modular framework. Our experiments on C-WAH and TDW-MAT demonstrate that CoELA driven by GPT-4 can surpass strong planning-based methods and exhibit emergent effective communication.
arXiv Detail & Related papers (2023-07-05T17:59:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.