COMMA: A Communicative Multimodal Multi-Agent Benchmark
- URL: http://arxiv.org/abs/2410.07553v2
- Date: Thu, 06 Feb 2025 22:55:46 GMT
- Title: COMMA: A Communicative Multimodal Multi-Agent Benchmark
- Authors: Timothy Ossowski, Jixuan Chen, Danyal Maqbool, Zefan Cai, Tyler Bradshaw, Junjie Hu,
- Abstract summary: We introduce a novel benchmark designed to evaluate the collaborative performance of multimodal multi-agent systems through language communication.
Our findings reveal surprising weaknesses in state-of-the-art models, including proprietary models like GPT-4o.
- Score: 7.831385481814481
- License:
- Abstract: The rapid advances of multimodal agents built on large foundation models have largely overlooked their potential for language-based communication between agents in collaborative tasks. This oversight presents a critical gap in understanding their effectiveness in real-world deployments, particularly when communicating with humans. Existing agentic benchmarks fail to address key aspects of inter-agent communication and collaboration, particularly in scenarios where agents have unequal access to information and must work together to achieve tasks beyond the scope of individual capabilities. To fill this gap, we introduce a novel benchmark designed to evaluate the collaborative performance of multimodal multi-agent systems through language communication. Our benchmark features a variety of scenarios, providing a comprehensive evaluation across four key categories of agentic capability in a communicative collaboration setting. By testing both agent-agent and agent-human collaborations using open-source and closed-source models, our findings reveal surprising weaknesses in state-of-the-art models, including proprietary models like GPT-4o. Some of these models struggle to outperform even a simple random agent baseline in agent-agent collaboration and only surpass the random baseline when a human is involved.
Related papers
- Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration [51.452664740963066]
Collaborative Gym is a framework enabling asynchronous, tripartite interaction among agents, humans, and task environments.
We instantiate Co-Gym with three representative tasks in both simulated and real-world conditions.
Our findings reveal that collaborative agents consistently outperform their fully autonomous counterparts in task performance.
arXiv Detail & Related papers (2024-12-20T09:21:15Z) - Towards Effective GenAI Multi-Agent Collaboration: Design and Evaluation for Enterprise Applications [15.480315462362531]
This report presents a comprehensive evaluation of coordination and routing capabilities in a novel multi-agent collaboration framework.
For coordination capabilities, we demonstrate the effectiveness of inter-agent communication and payload referencing mechanisms, achieving end-to-end goal success rates of 90%.
Our analysis yields several key findings: multi-agent collaboration enhances goal success rates by up to 70% compared to single-agent approaches in our benchmarks.
arXiv Detail & Related papers (2024-12-06T22:14:17Z) - BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems [15.159418172629701]
Large Language Models (LLMs) are becoming increasingly powerful and capable of handling complex tasks.
Compared to single agents, multi-agent systems have higher requirements for the collaboration capabilities of language models.
We propose a benchmark, called BattleAgentBench, which defines seven sub-stages of three varying difficulty levels.
arXiv Detail & Related papers (2024-08-28T17:43:55Z) - Scaling Large-Language-Model-based Multi-Agent Collaboration [75.5241464256688]
Pioneering advancements in large language model-powered agents have underscored the design pattern of multi-agent collaboration.
Inspired by the neural scaling law, this study investigates whether a similar principle applies to increasing agents in multi-agent collaboration.
arXiv Detail & Related papers (2024-06-11T11:02:04Z) - Multi-Agent Consensus Seeking via Large Language Models [6.336670103502898]
Multi-agent systems driven by large language models (LLMs) have shown promising abilities for solving complex tasks in a collaborative manner.
This work considers a fundamental problem in multi-agent collaboration: consensus seeking.
arXiv Detail & Related papers (2023-10-31T03:37:11Z) - Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation [52.930183136111864]
We propose using scorable negotiation to evaluate Large Language Models (LLMs)
To reach an agreement, agents must have strong arithmetic, inference, exploration, and planning capabilities.
We provide procedures to create new games and increase games' difficulty to have an evolving benchmark.
arXiv Detail & Related papers (2023-09-29T13:33:06Z) - ProAgent: Building Proactive Cooperative Agents with Large Language
Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents.
ProAgent can analyze the present state, and infer the intentions of teammates from observations.
ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z) - AgentVerse: Facilitating Multi-Agent Collaboration and Exploring
Emergent Behaviors [93.38830440346783]
We propose a multi-agent framework framework that can collaboratively adjust its composition as a greater-than-the-sum-of-its-parts system.
Our experiments demonstrate that framework framework can effectively deploy multi-agent groups that outperform a single agent.
In view of these behaviors, we discuss some possible strategies to leverage positive ones and mitigate negative ones for improving the collaborative potential of multi-agent groups.
arXiv Detail & Related papers (2023-08-21T16:47:11Z) - CAMEL: Communicative Agents for "Mind" Exploration of Large Language
Model Society [58.04479313658851]
This paper explores the potential of building scalable techniques to facilitate autonomous cooperation among communicative agents.
We propose a novel communicative agent framework named role-playing.
Our contributions include introducing a novel communicative agent framework, offering a scalable approach for studying the cooperative behaviors and capabilities of multi-agent systems.
arXiv Detail & Related papers (2023-03-31T01:09:00Z) - The Emergence of Adversarial Communication in Multi-Agent Reinforcement
Learning [6.18778092044887]
Many real-world problems require the coordination of multiple autonomous agents.
Recent work has shown the promise of Graph Neural Networks (GNNs) to learn explicit communication strategies that enable complex multi-agent coordination.
We show how a single self-interested agent is capable of learning highly manipulative communication strategies that allows it to significantly outperform a cooperative team of agents.
arXiv Detail & Related papers (2020-08-06T12:48:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.