Open-Source LLMs Collaboration Beats Closed-Source LLMs: A Scalable Multi-Agent System
- URL: http://arxiv.org/abs/2507.14200v1
- Date: Mon, 14 Jul 2025 16:17:11 GMT
- Title: Open-Source LLMs Collaboration Beats Closed-Source LLMs: A Scalable Multi-Agent System
- Authors: Shengji Tang, Jianjian Cao, Weihao Lin, Jiale Hong, Bo Zhang, Shuyue Hu, Lei Bai, Tao Chen, Wanli Ouyang, Peng Ye,
- Abstract summary: This paper aims to demonstrate the potential and strengths of open-source collectives.<n>We propose SMACS, a scalable multi-agent collaboration system (MACS) framework with high performance.<n> Experiments on eight mainstream benchmarks validate the effectiveness of our SMACS.
- Score: 51.04535721779685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper aims to demonstrate the potential and strengths of open-source collectives. It leads to a promising question: Can we harness multiple open-source LLMs to match or even beat the closed-source LLMs? To answer this, we propose SMACS, a scalable multi-agent collaboration system (MACS) framework with high performance. Specifically, for continuous integration of new LLMs and generalization to diverse questions, we first propose a Retrieval-based Prior Selection (RPS), which assigns a proxy performance score to each LLM to select the Top-k LLMs at the instance level for any given question. Then, we propose an Exploration-Exploitation-Driven Posterior Enhancement (EPE), encouraging the generation of diverse responses through prior dropping and selecting the high-quality response via a hybrid posterior score. Experiments on eight mainstream benchmarks validate the effectiveness of our SMACS: by integrating fifteen open-source LLMs, SMACS outperforms leading closed-source LLMs in 2025, e.g., Claude-3.7-Sonnet (+12.73%), GPT-4.1(+5.36%) and GPT-o3-mini(+5.28%) across multiple tasks. Remarkably, it even exceeds the average of best results of different datasets from both open-source LLMs (+2.86%) and closed-source LLMs (+2.04%), pushing the upper bound of intelligence. Code will be released at https://github.com/magent4aci/SMACS.
Related papers
- An Empirical Study of Many-to-Many Summarization with Large Language Models [82.10000188179168]
Large language models (LLMs) have shown strong multi-lingual abilities, giving them the potential to perform Many-to-many summarization (M2MS) in real applications.<n>This work presents a systematic empirical study on LLMs' M2MS ability.
arXiv Detail & Related papers (2025-05-19T11:18:54Z) - Teamwork makes the dream work: LLMs-Based Agents for GitHub README.MD Summarization [7.330697128881243]
We propose Metagente as a novel approach to amplify the synergy of various Large Language Models (LLMs)<n>Metagente is a Multi-Agent framework based on a series of LLMs to self-optimize the system through evaluation, feedback, and cooperation among specialized agents.<n>The performance gain compared to GitSum, the most relevant benchmark, ranges from 27.63% to 60.43%.
arXiv Detail & Related papers (2025-03-13T20:42:39Z) - Symbiotic Cooperation for Web Agents: Harnessing Complementary Strengths of Large and Small LLMs [38.86873408585195]
Web browsing agents powered by large language models (LLMs) have shown tremendous potential in automating complex web-based tasks.<n>Existing approaches typically rely on large LLMs to explore web environments and generate trajectory data.<n>We propose AgentSymbiotic, an iterative framework that couples data synthesis with task-performance.
arXiv Detail & Related papers (2025-02-11T20:41:49Z) - LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity [7.945893812374361]
We introduce the focal diversity metric to capture the diversity-performance correlation among component LLMs of an ensemble.
We develop a diversity-optimized ensemble pruning algorithm to select the top-k sub-ensembles from a pool of $N$ base LLMs.
Our pruning method recommends top-performing LLM subensembles of size $S$, often much smaller than $N$.
arXiv Detail & Related papers (2024-10-04T22:31:15Z) - MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series [86.31735321970481]
We open-source MAP-Neo, a bilingual language model with 7B parameters trained from scratch on 4.5T high-quality tokens.
Our MAP-Neo is the first fully open-sourced bilingual LLM with comparable performance compared to existing state-of-the-art LLMs.
arXiv Detail & Related papers (2024-05-29T17:57:16Z) - Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization [18.73637736606997]
Pack of LLMs (PackLLM) is an effective method for test-time fusion that leverages each LLM's expertise, given an input prompt.
We conduct experiments with over 100 total Large Language Models (LLMs) on a diverse set of tasks.
PackLLM outperforms test-time fusion baselines by 1.89% accuracy points.
arXiv Detail & Related papers (2024-04-17T16:24:07Z) - PiCO: Peer Review in LLMs based on the Consistency Optimization [48.48819141999387]
We use peer-review mechanisms to measure large language models (LLMs) automatically.<n>We formalize it as a constrained optimization problem, intending to maximize the consistency of each LLM's capabilities and scores.<n>We propose three metrics called PEN, CIN, and LIS to evaluate the gap in aligning human rankings.
arXiv Detail & Related papers (2024-02-02T18:49:26Z) - Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs)
We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM.
Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z) - LLM360: Towards Fully Transparent Open-Source LLMs [89.05970416013403]
The goal of LLM360 is to support open and collaborative AI research by making the end-to-end training process transparent and reproducible by everyone.
As a first step of LLM360, we release two 7B parameter LLMs pre-trained from scratch, Amber and CrystalCoder, including their training code, data, intermediate checkpoints, and analyses.
arXiv Detail & Related papers (2023-12-11T17:39:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.