Related papers: Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial?

Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial?

URL: http://arxiv.org/abs/2502.00674v1
Date: Sun, 02 Feb 2025 05:23:29 GMT
Title: Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial?
Authors: Wenzhe Li, Yong Lin, Mengzhou Xia, Chi Jin,
Abstract summary: Mixture-of-Agents (MoA) is an ensemble method that aggregates outputs from multiple different Large Language Models (LLMs)<n>We propose Self-MoA -- an ensemble method that aggregates outputs from only the single top-performing LLM.<n>Applying Self-MoA to one of the top-ranking models in AlpacaEval 2.0 directly achieves the new state-of-the-art performance on the leaderboard.
Score: 34.09805047407917
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Ensembling outputs from diverse sources is a straightforward yet effective approach to boost performance. Mixture-of-Agents (MoA) is one such popular ensemble method that aggregates outputs from multiple different Large Language Models (LLMs). This paper raises the question in the context of language models: is mixing different LLMs truly beneficial? We propose Self-MoA -- an ensemble method that aggregates outputs from only the single top-performing LLM. Our extensive experiments reveal that, surprisingly, Self-MoA outperforms standard MoA that mixes different LLMs in a large number of scenarios: Self-MoA achieves $6.6\%$ improvement over MoA on the AlpacaEval 2.0 benchmark, and an average of $3.8\%$ improvement across various benchmarks, including MMLU, CRUX, and MATH. Applying Self-MoA to one of the top-ranking models in AlpacaEval 2.0 directly achieves the new state-of-the-art performance on the leaderboard. To understand the effectiveness of Self-MoA, we systematically investigate the trade-off between diversity and quality of outputs under various MoA settings. We confirm that the MoA performance is rather sensitive to the quality, and mixing different LLMs often lowers the average quality of the models. To complement the study, we identify the scenarios where mixing different LLMs could be helpful. This paper further introduces a sequential version of Self-MoA, that is capable of aggregating a large number of LLM outputs on-the-fly over multiple rounds, and is as effective as aggregating all outputs at once.

Related papers

Optimizing Model Selection for Compound AI Systems [76.69936664916061]
We propose an efficient framework for model selection in compound systems. It iteratively selects one module and allocates to it the model with the highest module-wise performance. It confers 5%-70% accuracy gains compared to using the same LLM for all modules.
arXiv Detail & Related papers (2025-02-20T18:36:25Z)
$γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models [87.43596173378913]
We propose an innovative strategy for existing MLLMs called $gamma$-MoD. In $gamma$-MoD, a novel metric is proposed to guide the deployment of MoDs in the MLLM. Based on ARank, we propose two novel designs to maximize the computational sparsity of MLLM.
arXiv Detail & Related papers (2024-10-17T17:59:53Z)
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free [21.59456761618456]
Large language models (LLMs) excel on generation tasks, their decoder-only architecture often limits their potential as embedding models if no further representation finetuning is applied. Our study shows that the expert routers in MoE LLMs can serve as an off-the-shelf embedding model with promising performance on a diverse class of embedding-focused tasks.
arXiv Detail & Related papers (2024-10-14T17:59:44Z)
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation [41.05687297326706]
LLaVA-MoD is a framework designed to enable the efficient training of small-scale Multimodal Language Models. We optimize the network structure of s-MLLM by integrating a sparse Mixture of Experts architecture into the language model. We also propose a progressive knowledge transfer strategy to ensure comprehensive knowledge migration.
arXiv Detail & Related papers (2024-08-28T15:52:23Z)
Mixture-of-Agents Enhances Large Language Model Capabilities [34.68610100315386]
We propose a new approach to harness the collective expertise of multiple large language models (LLMs) through a Mixture-of-Agents (MoA) methodology. In our approach, we construct a layered MoA architecture wherein each layer comprises multiple LLM agents. MoA models achieves state-of-art performance on AlpacaEval 2.0, MT-Bench and FLASK, surpassing GPT-4 Omni.
arXiv Detail & Related papers (2024-06-07T07:04:10Z)
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts [41.80218225636109]
CuMo improves model scalability during training while keeping inference costs similar to those of smaller models. CuMo incorporates sparsely-gated Mixture-of-Experts blocks into both the vision encoder and the connector. The code and model weights for CuMo are open-sourced at https://github.com/SHI-Labs/CuMo.
arXiv Detail & Related papers (2024-05-09T17:37:20Z)
Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs) We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM. Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z)
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models [69.51130760097818]
We propose Zooter, a reward-guided routing method distilling rewards on training queries to train a routing function. We evaluate Zooter on a comprehensive benchmark collection with 26 subsets on different domains and tasks.
arXiv Detail & Related papers (2023-11-15T04:40:43Z)
Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE [83.00018517368973]
Large Language Models (LLMs) can extend their zero-shot capabilities to multimodal learning through instruction tuning. negative conflicts and interference may have a worse impact on performance. We combine the well-known Mixture-of-Experts (MoE) and one of the representative PEFT techniques, i.e., LoRA, designing a novel LLM-based decoder, called LoRA-MoE, for multimodal learning.
arXiv Detail & Related papers (2023-11-05T15:48:29Z)
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models [73.86954509967416]
Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks. This paper presents the first comprehensive MLLM Evaluation benchmark MME. It measures both perception and cognition abilities on a total of 14 subtasks.
arXiv Detail & Related papers (2023-06-23T09:22:36Z)
Generative Multimodal Entity Linking [24.322540112710918]
Multimodal Entity Linking (MEL) is the task of mapping mentions with multimodal contexts to referent entities from a knowledge base. Existing MEL methods mainly focus on designing complex multimodal interaction mechanisms and require fine-tuning all model parameters. We propose GEMEL, a Generative Multimodal Entity Linking framework based on Large Language Models (LLMs) Our framework is compatible with any off-the-shelf language model, paving the way towards an efficient and general solution.
arXiv Detail & Related papers (2023-06-22T07:57:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.