An Expert is Worth One Token: Synergizing Multiple Expert LLMs as Generalist via Expert Token Routing
- URL: http://arxiv.org/abs/2403.16854v3
- Date: Tue, 11 Jun 2024 15:12:09 GMT
- Title: An Expert is Worth One Token: Synergizing Multiple Expert LLMs as Generalist via Expert Token Routing
- Authors: Ziwei Chai, Guoyin Wang, Jing Su, Tianjie Zhang, Xuanwen Huang, Xuwu Wang, Jingjing Xu, Jianbo Yuan, Hongxia Yang, Fei Wu, Yang Yang,
- Abstract summary: Expert-Token-Routing represents expert LLMs as special expert tokens within the vocabulary of a meta LLM.
It supports learning the implicit expertise of expert LLMs from existing instruction dataset.
It also conceals the detailed collaboration process from the user's perspective, facilitating interaction as though it were a singular LLM.
- Score: 55.25224913110965
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Expert-Token-Routing, a unified generalist framework that facilitates seamless integration of multiple expert LLMs. Our framework represents expert LLMs as special expert tokens within the vocabulary of a meta LLM. The meta LLM can route to an expert LLM like generating new tokens. Expert-Token-Routing not only supports learning the implicit expertise of expert LLMs from existing instruction dataset but also allows for dynamic extension of new expert LLMs in a plug-and-play manner. It also conceals the detailed collaboration process from the user's perspective, facilitating interaction as though it were a singular LLM. Our framework outperforms various existing multi-LLM collaboration paradigms across benchmarks that incorporate six diverse expert domains, demonstrating effectiveness and robustness in building generalist LLM system via synergizing multiple expert LLMs.
Related papers
- BTS: Harmonizing Specialized Experts into a Generalist LLM [52.026293450944635]
Branch-Train-Stitch (BTS) is an efficient training algorithm for combining independently trained large language model (LLM) experts into a single, capable generalist model.
Compared to alternative model merging approaches, BTS yields the best generalist performance on a variety of downstream tasks.
arXiv Detail & Related papers (2025-01-31T07:54:34Z) - Scoring with Large Language Models: A Study on Measuring Empathy of Responses in Dialogues [3.2162648244439684]
We develop a framework for investigating how effective Large Language Models are at measuring and scoring empathy of responses in dialogues.
Our strategy is to approximate the performance of state-of-the-art and fine-tuned LLMs with explicit and explainable features.
Our results show that when only using embeddings, it is possible to achieve performance close to that of generic LLMs.
arXiv Detail & Related papers (2024-12-28T20:37:57Z) - Dynamic Ensemble Reasoning for LLM Experts [35.774197263383996]
We propose a Dynamic Ensemble Reasoning paradigm, called DER, to integrate the strengths of multiple LLM experts conditioned on dynamic inputs.
Our method uses fewer computational resources to achieve better performance compared to state-of-the-art baselines.
arXiv Detail & Related papers (2024-12-10T12:05:56Z) - LUK: Empowering Log Understanding with Expert Knowledge from Large Language Models [32.65636568742875]
Small language models (PLMs) and large language models (LLMs) have become the current mainstream approaches for log analysis.
This paper introduces a novel knowledge enhancement framework, called LUK, which acquires expert knowledge from LLMs automatically and then enhances the smaller PLM for log analysis with these expert knowledge.
LUK achieves state-of-the-art results on different log analysis tasks and extensive experiments demonstrate expert knowledge from LLMs can be utilized more effectively to understand logs.
arXiv Detail & Related papers (2024-09-03T13:58:34Z) - Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts [49.950419707905944]
We present Self-MoE, an approach that transforms a monolithic LLM into a compositional, modular system of self-specialized experts.
Our approach leverages self-specialization, which constructs expert modules using self-generated synthetic data.
Our findings highlight the critical role of modularity, the applicability of Self-MoE to multiple base LLMs, and the potential of self-improvement in achieving efficient, scalable, and adaptable systems.
arXiv Detail & Related papers (2024-06-17T19:06:54Z) - Small LLMs Are Weak Tool Learners: A Multi-LLM Agent [73.54562551341454]
Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs.
We propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer.
This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability.
arXiv Detail & Related papers (2024-01-14T16:17:07Z) - Video Understanding with Large Language Models: A Survey [97.29126722004949]
Given the remarkable capabilities of large language models (LLMs) in language and multimodal tasks, this survey provides a detailed overview of recent advancements in video understanding.
The emergent capabilities Vid-LLMs are surprisingly advanced, particularly their ability for open-ended multi-granularity reasoning.
This survey presents a comprehensive study of the tasks, datasets, benchmarks, and evaluation methodologies for Vid-LLMs.
arXiv Detail & Related papers (2023-12-29T01:56:17Z) - Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage
and Sharing in LLMs [72.49064988035126]
We propose an approach called MKS2, aimed at enhancing multimodal large language models (MLLMs)
Specifically, we introduce the Modular Visual Memory, a component integrated into the internal blocks of LLMs, designed to store open-world visual information efficiently.
Our experiments demonstrate that MKS2 substantially augments the reasoning capabilities of LLMs in contexts necessitating physical or commonsense knowledge.
arXiv Detail & Related papers (2023-11-27T12:29:20Z) - A Survey of Large Language Models for Code: Evolution, Benchmarking, and
Future Trends [30.774685501251817]
General large language models (LLMs) have demonstrated significant potential in tasks such as code generation in software engineering.
A considerable portion of Code LLMs is derived from general LLMs through model fine-tuning.
There is currently a lack of systematic investigation into Code LLMs and their performance.
arXiv Detail & Related papers (2023-11-17T07:55:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.