Related papers: Towards a Science of Collective AI: LLM-based Multi-Agent Systems Need a Transition from Blind Trial-and-Error to Rigorous Science

Towards a Science of Collective AI: LLM-based Multi-Agent Systems Need a Transition from Blind Trial-and-Error to Rigorous Science

URL: http://arxiv.org/abs/2602.05289v1
Date: Thu, 05 Feb 2026 04:19:52 GMT
Title: Towards a Science of Collective AI: LLM-based Multi-Agent Systems Need a Transition from Blind Trial-and-Error to Rigorous Science
Authors: Jingru Fan, Dewen Liu, Yufan Dang, Huatao Li, Yuheng Wang, Wei Liu, Feiyu Duan, Xuanwen Ding, Shu Yao, Lin Wu, Ruijie Shi, Wai-Shing Leung, Yuan Cheng, Zhongyu Wei, Cheng Yang, Chen Qian, Zhiyuan Liu, Maosong Sun,
Abstract summary: Large Language Models (LLMs) have greatly extended the capabilities of Multi-Agent Systems (MAS)<n>Despite this rapid progress, the field still relies heavily on empirical trial-and-error.<n>This bottleneck stems from the ambiguity of attribution.<n>We propose a factor attribution paradigm to systematically identify collaboration-driving factors.
Score: 70.3658845234978
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in Large Language Models (LLMs) have greatly extended the capabilities of Multi-Agent Systems (MAS), demonstrating significant effectiveness across a wide range of complex and open-ended domains. However, despite this rapid progress, the field still relies heavily on empirical trial-and-error. It lacks a unified and principled scientific framework necessary for systematic optimization and improvement. This bottleneck stems from the ambiguity of attribution: first, the absence of a structured taxonomy of factors leaves researchers restricted to unguided adjustments; second, the lack of a unified metric fails to distinguish genuine collaboration gain from mere resource accumulation. In this paper, we advocate for a transition to design science through an integrated framework. We advocate to establish the collaboration gain metric ($Γ$) as the scientific standard to isolate intrinsic gains from increased budgets. Leveraging $Γ$, we propose a factor attribution paradigm to systematically identify collaboration-driving factors. To support this, we construct a systematic MAS factor library, structuring the design space into control-level presets and information-level dynamics. Ultimately, this framework facilitates the transition from blind experimentation to rigorous science, paving the way towards a true science of Collective AI.

Related papers

MACC: Multi-Agent Collaborative Competition for Scientific Exploration [16.92873230140957]
We introduce MACC (Multi-Agent Collaborative Competition), an institutional architecture that integrates a blackboard-style shared scientific workspace with incentive mechanisms.<n> MACC provides a testbed for studying how institutional design influences scalable and reliable multi-agent scientific exploration.
arXiv Detail & Related papers (2026-03-04T06:38:04Z)
Guided Collaboration in Heterogeneous LLM-Based Multi-Agent Systems via Entropy-Based Understanding Assessment and Experience Retrieval [35.96356869281219]
We describe a counterintuitive phenomenon in the strong-weak system: a strong-weak collaboration may under-perform weak-weak combinations.<n>We propose an Entropy-Based Adaptive Guidance Framework that dynamically aligns the guidance with the cognitive state of each agent.<n>Our approach consistently enhances the effectiveness and stability of heterogeneous collaboration.
arXiv Detail & Related papers (2026-02-14T07:10:04Z)
Sci-CoE: Co-evolving Scientific Reasoning LLMs via Geometric Consensus with Sparse Supervision [15.806243963561776]
Sci-CoE is a two-stage scientific co-evolving framework that enables models to self-evolve as both solver and verifier.<n>In the first stage, the model uses a small set of annotated data to establish correctness judgment anchors for the Verifier.<n>In the second stage, we introduce a geometric reward mechanism that jointly considers consensus, reliability, and diversity, driving large-scale self-iteration.
arXiv Detail & Related papers (2026-02-12T16:46:00Z)
ATHENA: Agentic Team for Hierarchical Evolutionary Numerical Algorithms [4.235429894371577]
ATHENA is an agentic framework designed as an Autonomous Lab to manage the end-to-end computational research lifecycle.<n>Its core is the HENA loop, a knowledge-driven diagnostic process framed as a Contextual problem.<n>The framework achieves super-human performance, reaching validation errors of $10-14$.
arXiv Detail & Related papers (2025-12-03T06:05:27Z)
SelfAI: Building a Self-Training AI System with LLM Agents [79.10991818561907]
SelfAI is a general multi-agent platform that combines a User Agent for translating high-level research objectives into standardized experimental configurations.<n>An Experiment Manager orchestrates parallel, fault-tolerant training across heterogeneous hardware while maintaining a structured knowledge base for continuous feedback.<n>Across regression, computer vision, scientific computing, medical imaging, and drug discovery benchmarks, SelfAI consistently achieves strong performance and reduces redundant trials.
arXiv Detail & Related papers (2025-11-29T09:18:39Z)
Multi-Objective Bilevel Learning [12.198330173886587]
We investigate the theoretical and algorithmic foundation of multi-objective bilevel learning (MOBL)<n>Our goal is to develop efficient MOBL optimization algorithms.<n>We propose a unifying algorithmic framework called weighted-Chebyshev multi-hyper-gradient-descent (WC-MHGD)
arXiv Detail & Related papers (2025-11-11T04:42:14Z)
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers [251.23085679210206]
Scientific Large Language Models (Sci-LLMs) are transforming how knowledge is represented, integrated, and applied in scientific research.<n>This survey reframes the development of Sci-LLMs as a co-evolution between models and their underlying data substrate.<n>We formulate a unified taxonomy of scientific data and a hierarchical model of scientific knowledge.
arXiv Detail & Related papers (2025-08-28T18:30:52Z)
Topological Structure Learning Should Be A Research Priority for LLM-Based Multi-Agent Systems [69.95482609893236]
Large Language Model-based Multi-Agent Systems (MASs) have emerged as a powerful paradigm for tackling complex tasks through collaborative intelligence.<n>We call for a paradigm shift toward emphtopology-aware MASs that explicitly model and dynamically optimize the structure of inter-agent interactions.
arXiv Detail & Related papers (2025-05-28T15:20:09Z)
Multi-Agent Collaboration via Evolving Orchestration [55.574417128944226]
Large language models (LLMs) have achieved remarkable results across diverse downstream tasks, but their monolithic nature restricts scalability and efficiency in complex problem-solving.<n>We propose a puppeteer-style paradigm for LLM-based multi-agent collaboration, where a centralized orchestrator ("puppeteer") dynamically directs agents ("puppets") in response to evolving task states.<n> Experiments on closed- and open-domain scenarios show that this method achieves superior performance with reduced computational costs.
arXiv Detail & Related papers (2025-05-26T07:02:17Z)
A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems [93.8285345915925]
Reasoning is a fundamental cognitive process that enables logical inference, problem-solving, and decision-making.<n>With the rapid advancement of large language models (LLMs), reasoning has emerged as a key capability that distinguishes advanced AI systems.<n>We categorize existing methods along two dimensions: (1) Regimes, which define the stage at which reasoning is achieved; and (2) Architectures, which determine the components involved in the reasoning process.
arXiv Detail & Related papers (2025-04-12T01:27:49Z)
A Survey on Post-training of Large Language Models [185.51013463503946]
Large Language Models (LLMs) have fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration.<n>These challenges necessitate advanced post-training language models (PoLMs) to address shortcomings, such as restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance.<n>This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms: Fine-tuning, which enhances task-specific accuracy; Alignment, which ensures ethical coherence and alignment with human preferences; Reasoning, which advances multi-step inference despite challenges in reward design; Integration and Adaptation, which
arXiv Detail & Related papers (2025-03-08T05:41:42Z)
SoK: Software Compartmentalization [3.058923790501231]
Decomposing large systems into smaller components has long been recognized as an effective means to minimize the impact of exploits. Despite historical roots, demonstrated benefits, and a plethora of research efforts in academia and industry, the compartmentalization of software is still not a mainstream practice. We propose a unified model for the systematic analysis, comparison, and directing of compartmentalization approaches.
arXiv Detail & Related papers (2024-10-11T00:38:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.