Multi-Agent Teams Hold Experts Back
- URL: http://arxiv.org/abs/2602.01011v3
- Date: Mon, 09 Feb 2026 05:20:40 GMT
- Title: Multi-Agent Teams Hold Experts Back
- Authors: Aneesh Pappu, Batu El, Hancheng Cao, Carmelo di Nolfo, Yanchao Sun, Meng Cao, James Zou,
- Abstract summary: We study whether self-organizing LLM teams achieve strong synergy.<n>We find that -- unlike human teams -- LLM teams consistently fail to match their expert agent's performance.<n>We show that expert leveraging, rather than identification, is the primary bottleneck.
- Score: 37.015657067301355
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-agent LLM systems are increasingly deployed as autonomous collaborators, where agents interact freely rather than execute fixed, pre-specified workflows. In such settings, effective coordination cannot be fully designed in advance and must instead emerge through interaction. However, most prior work enforces coordination through fixed roles, workflows, or aggregation rules, leaving open the question of how well self-organizing teams perform when coordination is unconstrained. Drawing on organizational psychology, we study whether self-organizing LLM teams achieve strong synergy, where team performance matches or exceeds the best individual member. Across human-inspired and frontier ML benchmarks, we find that -- unlike human teams -- LLM teams consistently fail to match their expert agent's performance, even when explicitly told who the expert is, incurring performance losses of up to 37.6%. Decomposing this failure, we show that expert leveraging, rather than identification, is the primary bottleneck. Conversational analysis reveals a tendency toward integrative compromise -- averaging expert and non-expert views rather than appropriately weighting expertise -- which increases with team size and correlates negatively with performance. Interestingly, this consensus-seeking behavior improves robustness to adversarial agents, suggesting a trade-off between alignment and effective expertise utilization. Our findings reveal a significant gap in the ability of self-organizing multi-agent teams to harness the collective expertise of their members.
Related papers
- Collaborative Causal Sensemaking: Closing the Complementarity Gap in Human-AI Decision Support [0.0]
LLM-based agents are increasingly deployed for expert decision support.<n>Yet human-AI teams in high-stakes settings do not yet reliably outperform the best individual.<n>We argue this complementarity gap reflects a fundamental mismatch: current agents are trained as answer engines, not as partners in the collaborative sensemaking through which experts actually make decisions.
arXiv Detail & Related papers (2025-12-08T18:30:41Z) - SelfAI: Building a Self-Training AI System with LLM Agents [79.10991818561907]
SelfAI is a general multi-agent platform that combines a User Agent for translating high-level research objectives into standardized experimental configurations.<n>An Experiment Manager orchestrates parallel, fault-tolerant training across heterogeneous hardware while maintaining a structured knowledge base for continuous feedback.<n>Across regression, computer vision, scientific computing, medical imaging, and drug discovery benchmarks, SelfAI consistently achieves strong performance and reduces redundant trials.
arXiv Detail & Related papers (2025-11-29T09:18:39Z) - Completion $\neq$ Collaboration: Scaling Collaborative Effort with Agents [48.95020665909723]
We argue for a shift from building and assessing task completion agents to developing collaborative agents.<n>We introduce collaborative effort scaling, a framework that captures how an agent's utility grows with increasing user involvement.
arXiv Detail & Related papers (2025-10-29T17:47:18Z) - Learning "Partner-Aware" Collaborators in Multi-Party Collaboration [12.287537011305497]
Large Language Models (LLMs) are increasingly bring deployed in agentic settings where they act as collaborators with humans.<n>This paper builds on the AI alignment and safe interruptability literature to offer novel theoretical insights on collaborative behavior.<n>We propose Interruptible Collaborative Roleplayer (ICR)-a novel partner-aware learning algorithm to train CG-optimal collaborators.
arXiv Detail & Related papers (2025-10-26T00:05:48Z) - Emergent Coordination in Multi-Agent Language Models [2.504366738288215]
We introduce an information-theoretic framework to test whether multi-agent systems show signs of higher-order structure.<n>This information decomposition lets us measure whether dynamical emergence is present in multi-agent LLM systems.<n>We apply our framework to experiments using a simple guessing game without direct agent communication.
arXiv Detail & Related papers (2025-10-05T11:26:41Z) - Identifying Cooperative Personalities in Multi-agent Contexts through Personality Steering with Representation Engineering [0.0]
Large Language Models (LLMs) gain autonomous capabilities, their coordination in multi-agent settings becomes increasingly important.<n>Inspired by Axelrod's Iterated Prisoner's Dilemma (IPD) tournaments, we explore how personality traits influence LLM cooperation.<n>Using representation engineering, we steer Big Five traits (e.g., Agreeableness, Conscientiousness) in LLMs and analyze their impact on IPD decision-making.
arXiv Detail & Related papers (2025-03-17T01:21:54Z) - Who is Helping Whom? Analyzing Inter-dependencies to Evaluate Cooperation in Human-AI Teaming [13.263258837438045]
We propose the concept of constructive interdependence as a key metric for evaluating cooperation in human-agent teams.<n>Our results demonstrate that although trained agents attain high task rewards, they fail to induce cooperative behavior.<n>Our analysis reveals that teaming performance is not necessarily correlated with task reward, highlighting that task reward alone cannot reliably measure cooperation.
arXiv Detail & Related papers (2025-02-10T19:16:20Z) - Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration [50.657070334404835]
Collaborative Gym is a framework enabling asynchronous, tripartite interaction among agents, humans, and task environments.<n>We instantiate Co-Gym with three representative tasks in both simulated and real-world conditions.<n>Our findings reveal that collaborative agents consistently outperform their fully autonomous counterparts in task performance.
arXiv Detail & Related papers (2024-12-20T09:21:15Z) - Collaboration Dynamics and Reliability Challenges of Multi-Agent LLM Systems in Finite Element Analysis [3.437656066916039]
How interagent dynamics influence reasoning quality and verification reliability remains unclear.<n>We study these mechanisms using an AutoGen-based multi-agent framework for linear-elastic Finite Element Analysis (FEA)<n>From 1,120 controlled trials, we find that collaboration effectiveness depends more on functional complementarity than team size.
arXiv Detail & Related papers (2024-08-23T23:11:08Z) - TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition [61.91764883512776]
We introduce an innovative PEFT method, TeamLoRA, consisting of a collaboration and competition module for experts.
By doing so, TeamLoRA connects the experts as a "Team" with internal collaboration and competition, enabling a faster and more accurate PEFT paradigm for multi-task learning.
arXiv Detail & Related papers (2024-08-19T09:58:53Z) - Learning to Incentivize Other Learning Agents [73.03133692589532]
We show how to equip RL agents with the ability to give rewards directly to other agents, using a learned incentive function.
Such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games.
Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
arXiv Detail & Related papers (2020-06-10T20:12:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.