Related papers: CCoE: A Compact LLM with Collaboration of Experts

CCoE: A Compact LLM with Collaboration of Experts

URL: http://arxiv.org/abs/2407.11686v3
Date: Thu, 25 Jul 2024 03:34:56 GMT
Title: CCoE: A Compact LLM with Collaboration of Experts
Authors: Shaomang Huang, Jianfeng Pan, Hanzhong Zheng,
Abstract summary: We propose a framework of easily coupling strong domain experts together to fuse into a big Large Language Model (LLM) We start with 5 experts in the domain of Code, Law, text-to- Math and Medical. The results indicate that our CCoE framework can easily and efficiently boost nearly 10%-20% performance on original base model in different domains but using less resources on training, as well as inference.
Score: 0.6144680854063939
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In the domain of Large Language Model (LLM), LLMs demonstrate significant capabilities in natural language understanding and generation. With the growing needs of applying LLMs on various domains, it is a research question that how to efficiently train and build a model that has expertise in different domains but with a low training cost. We propose CCoE architecture, a framework of easily coupling multiple strong domain experts together to fuse into a big LLM, provides a collective way of utilizing the different domain expert LLMs. Besides, training a large collaborative of multiple expert LLMs requires a high requirements on training sources. CCoE bypasses this problem through isolating other experts and train each expert separately. The design of CCoE assembles multiple expert LLMs through the CoE (Collaboration of Experts) layer. Each CoE layer could have one or more expert LLMs. Expert LLMs have different number of layers and have been well-trained for different domain tasks. Each expert is fine-tuned to be able to achieve the comparable results with SOTA domain LLMs. We start from 5 experts in the domain of Code, Math, Law, text-to-SQL and Medical. The results indicate that our CCoE framework can easily and efficiently boost nearly 10%-20% performance on original base model in different domains but using less resources on training, as well as inference.

Related papers

Graft: Integrating the Domain Knowledge via Efficient Parameter Synergy for MLLMs [56.76586846269894]
Multimodal Large Language Models (MLLMs) have achieved success across various domains.<n>Despite its importance, the study of knowledge sharing among domain-specific MLLMs remains largely underexplored.<n>We propose a unified parameter integration framework that enables modular composition of expert capabilities.
arXiv Detail & Related papers (2025-06-30T15:07:41Z)
Co-Saving: Resource Aware Multi-Agent Collaboration for Software Development [65.94639060883475]
We propose a resource-aware multi-agent system -- Co-Saving.<n>Our key innovation is the introduction of "shortcuts"<n>Compared to the state-of-the-art MAS ChatDev, our method achieves an average reduction of 50.85% in token usage.
arXiv Detail & Related papers (2025-05-28T02:23:53Z)
Efficient Multi-modal Long Context Learning for Training-free Adaptation [96.21248144937627]
This paper introduces Efficient Multi-Modal Long Context Learning (EMLoC)<n>It embeds demonstration examples directly into the model input.<n>It condenses long-context multimodal inputs into compact, task-specific memory representations.
arXiv Detail & Related papers (2025-05-26T10:49:44Z)
Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs [64.83462841029089]
We introduce an efficient merging-based alignment method called textscMergeAlign that interpolates the domain and alignment vectors, creating safer domain-specific models. We apply textscMergeAlign on Llama3 variants that are experts in medicine and finance, obtaining substantial alignment improvements with minimal to no degradation on domain-specific benchmarks.
arXiv Detail & Related papers (2024-11-11T09:32:20Z)
Scalable Multi-Domain Adaptation of Language Models using Modular Experts [10.393155077703653]
MoDE is a mixture-of-experts architecture that augments a general PLM with modular, domain-specialized experts. MoDE achieves comparable target performances to full parameter fine-tuning while achieving 1.65% better retention performance.
arXiv Detail & Related papers (2024-10-14T06:02:56Z)
Retraining-Free Merging of Sparse MoE via Hierarchical Clustering [14.858134039539697]
This paper introduces Hierarchical Clustering for Sparsely activated Mixture of Experts (HC-SMoE) HC-SMoE is a task-agnostic expert merging framework for parameter reduction without retraining. We provide theoretical analysis and evaluations across multiple zero-shot language tasks to demonstrate HC-SMoE's effectiveness in state-of-the-art models including Qwen and Mixtral.
arXiv Detail & Related papers (2024-10-11T07:36:14Z)
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications. FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z)
SoupLM: Model Integration in Large Language and Multi-Modal Models [51.12227693121004]
Training large language models (LLMs) requires significant computing resources. Existing publicly available LLMs are typically pre-trained on diverse, privately curated datasets spanning various tasks.
arXiv Detail & Related papers (2024-07-11T05:38:15Z)
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts [49.950419707905944]
We present Self-MoE, an approach that transforms a monolithic LLM into a compositional, modular system of self-specialized experts. Our approach leverages self-specialization, which constructs expert modules using self-generated synthetic data. Our findings highlight the critical role of modularity, the applicability of Self-MoE to multiple base LLMs, and the potential of self-improvement in achieving efficient, scalable, and adaptable systems.
arXiv Detail & Related papers (2024-06-17T19:06:54Z)
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities. Specifically, it features modality-specific encoders with connectors for a unified multimodal representation. We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z)
LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions [8.55917897789612]
We focus on the cooperative tasks of multiple agents with a common goal and communication among them. We also consider human-in/on-the-loop scenarios enabled by the language component in the framework.
arXiv Detail & Related papers (2024-05-17T22:10:23Z)
BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models [56.89958793648104]
Large Language Models (LLMs) are versatile and capable of addressing a diverse range of tasks. Previous approaches either conduct continuous pre-training with domain-specific data or employ retrieval augmentation to support general LLMs. We present a novel framework named BLADE, which enhances Black-box LArge language models with small Domain-spEcific models.
arXiv Detail & Related papers (2024-03-27T08:57:21Z)
An Expert is Worth One Token: Synergizing Multiple Expert LLMs as Generalist via Expert Token Routing [55.25224913110965]
Expert-Token-Routing represents expert LLMs as special expert tokens within the vocabulary of a meta LLM. It supports learning the implicit expertise of expert LLMs from existing instruction dataset. It also conceals the detailed collaboration process from the user's perspective, facilitating interaction as though it were a singular LLM.
arXiv Detail & Related papers (2024-03-25T15:17:05Z)
Learning to Decode Collaboratively with Multiple Language Models [37.31339648499042]
We propose a method to teach multiple large language models (LLM) to collaborate by interleaving their generations at the token level. Token-level collaboration during decoding allows for a fusion of each model's expertise in a manner tailored to the specific task at hand.
arXiv Detail & Related papers (2024-03-06T17:23:28Z)
SepRep-Net: Multi-source Free Domain Adaptation via Model Separation And Reparameterization [75.74369886582394]
We propose a novel framework called SepRep-Net to tackle multi-source free domain adaptation. SepRep-Net reassembled multiple existing models to a unified network, while maintaining separate pathways (Separation) SepRep-Net is characterized by 1) effectiveness: competitive performance on the target domain, 2) efficiency: low computational costs, and 3) generalizability: maintaining more source knowledge than existing solutions.
arXiv Detail & Related papers (2024-02-13T06:35:00Z)
Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs) We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM. Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z)
Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration [83.4031923134958]
Corex is a suite of novel general-purpose strategies that transform Large Language Models into autonomous agents. Inspired by human behaviors, Corex is constituted by diverse collaboration paradigms including Debate, Review, and Retrieve modes. We demonstrate that orchestrating multiple LLMs to work in concert yields substantially better performance compared to existing methods.
arXiv Detail & Related papers (2023-09-30T07:11:39Z)
Decoupled Training: Return of Frustratingly Easy Multi-Domain Learning [20.17925272562433]
Multi-domain learning aims to train a model with minimal average risk across multiple overlapping but non-identical domains. We propose Decoupled Training (D-Train) as a frustratingly easy and hyper parameter-free multi-domain learning method. D-Train is a tri-phase general-to-specific training strategy that first pre-trains on all domains to warm up a root model, then post-trains on each domain by splitting into multi-heads, and finally fine-tunes the heads by fixing the backbone.
arXiv Detail & Related papers (2023-09-19T04:06:41Z)
Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation [86.02485817444216]
We introduce Multi-Prompt Alignment (MPA), a simple yet efficient framework for multi-source UDA. MPA denoises the learned prompts through an auto-encoding process and aligns them by maximizing the agreement of all the reconstructed prompts. Experiments show that MPA achieves state-of-the-art results on three popular datasets with an impressive average accuracy of 54.1% on DomainNet.
arXiv Detail & Related papers (2022-09-30T03:40:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.