When One LLM Drools, Multi-LLM Collaboration Rules
- URL: http://arxiv.org/abs/2502.04506v1
- Date: Thu, 06 Feb 2025 21:13:44 GMT
- Title: When One LLM Drools, Multi-LLM Collaboration Rules
- Authors: Shangbin Feng, Wenxuan Ding, Alisa Liu, Zifeng Wang, Weijia Shi, Yike Wang, Zejiang Shen, Xiaochuang Han, Hunter Lang, Chen-Yu Lee, Tomas Pfister, Yejin Choi, Yulia Tsvetkov,
- Abstract summary: We argue for multi-LLM collaboration to better represent the extensive diversity of data, skills, and people.<n>We organize existing multi-LLM collaboration methods into a hierarchy, based on the level of access and information exchange.<n>We envision multi-LLM collaboration as an essential path toward compositional intelligence and collaborative AI development.
- Score: 98.71562711695991
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This position paper argues that in many realistic (i.e., complex, contextualized, subjective) scenarios, one LLM is not enough to produce a reliable output. We challenge the status quo of relying solely on a single general-purpose LLM and argue for multi-LLM collaboration to better represent the extensive diversity of data, skills, and people. We first posit that a single LLM underrepresents real-world data distributions, heterogeneous skills, and pluralistic populations, and that such representation gaps cannot be trivially patched by further training a single LLM. We then organize existing multi-LLM collaboration methods into a hierarchy, based on the level of access and information exchange, ranging from API-level, text-level, logit-level, to weight-level collaboration. Based on these methods, we highlight how multi-LLM collaboration addresses challenges that a single LLM struggles with, such as reliability, democratization, and pluralism. Finally, we identify the limitations of existing multi-LLM methods and motivate future work. We envision multi-LLM collaboration as an essential path toward compositional intelligence and collaborative AI development.
Related papers
- Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents [17.773801766612703]
Large language models (LLMs) based agent systems have made great strides in real-world applications beyond traditional NLP tasks.
This paper proposes a new benchmark, Collab-Overcooked, built on the popular Overcooked-AI game with more applicable and challenging tasks in interactive environments.
arXiv Detail & Related papers (2025-02-27T13:31:13Z) - MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning [26.736078756799635]
We introduce a new post-training paradigm MAPoRL (Multi-Agent Post-co-training for collaborative LLMs with Reinforcement Learning)
In MAPoRL, multiple LLMs first generate their own responses independently and engage in a multi-turn discussion to collaboratively improve the final answer.
A MAPoRL verifier evaluates both the answer and the discussion, by assigning a score that verifies the correctness of the answer.
The score serves as the co-training reward, and is then maximized through multi-agent RL.
arXiv Detail & Related papers (2025-02-25T18:33:48Z) - Multi-LLM Text Summarization [58.74987409988719]
We propose a Multi-LLM summarization framework, and investigate two different multi-LLM strategies including centralized and decentralized.
Our framework has two fundamentally important steps at each round of conversation: generation and evaluation.
We find that our multi-LLM summarization approaches significantly outperform the baselines that leverage only a single LLM by up to 3x.
arXiv Detail & Related papers (2024-12-20T01:55:26Z) - CollabStory: Multi-LLM Collaborative Story Generation and Authorship Analysis [5.929552001093879]
We generate the first exclusively LLM-generated collaborative stories dataset called CollabStory.
We generate over 32k stories using open-source instruction-tuned LLMs.
We extend their authorship-related tasks for multi-LLM settings and present baselines for LLM-LLM collaboration.
arXiv Detail & Related papers (2024-06-18T14:35:12Z) - Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs)
We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM.
Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z) - Small LLMs Are Weak Tool Learners: A Multi-LLM Agent [73.54562551341454]
Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs.
We propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer.
This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability.
arXiv Detail & Related papers (2024-01-14T16:17:07Z) - u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model [17.3535277338312]
u-LLaVA is an innovative unifying multi-task framework that integrates pixel, regional, and global features to refine the perceptual faculties of MLLMs.
This work contributes a novel mask-based multi-task dataset comprising 277K samples, crafted to challenge and assess the fine-grained perception capabilities of MLLMs.
arXiv Detail & Related papers (2023-11-09T13:18:27Z) - Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration [83.4031923134958]
Corex is a suite of novel general-purpose strategies that transform Large Language Models into autonomous agents.
Inspired by human behaviors, Corex is constituted by diverse collaboration paradigms including Debate, Review, and Retrieve modes.
We demonstrate that orchestrating multiple LLMs to work in concert yields substantially better performance compared to existing methods.
arXiv Detail & Related papers (2023-09-30T07:11:39Z) - A Survey on Multimodal Large Language Models [71.63375558033364]
Multimodal Large Language Model (MLLM) represented by GPT-4V has been a new rising research hotspot.
This paper aims to trace and summarize the recent progress of MLLMs.
arXiv Detail & Related papers (2023-06-23T15:21:52Z) - Examining Inter-Consistency of Large Language Models Collaboration: An
In-depth Analysis via Debate [41.949869545423375]
Large Language Models (LLMs) have shown impressive capabilities in various applications, but they still face various inconsistency issues.
To examine whether LLMs can collaborate effectively to achieve a consensus for a shared goal, we focus on commonsense reasoning.
Our work contributes to understanding the inter-consistency among LLMs and lays the foundation for developing future collaboration methods.
arXiv Detail & Related papers (2023-05-19T11:15:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.