Related papers: An Expert is Worth One Token: Synergizing Multiple Expert LLMs as Generalist via Expert Token Routing

An Expert is Worth One Token: Synergizing Multiple Expert LLMs as Generalist via Expert Token Routing

URL: http://arxiv.org/abs/2403.16854v3
Date: Tue, 11 Jun 2024 15:12:09 GMT
Title: An Expert is Worth One Token: Synergizing Multiple Expert LLMs as Generalist via Expert Token Routing
Authors: Ziwei Chai, Guoyin Wang, Jing Su, Tianjie Zhang, Xuanwen Huang, Xuwu Wang, Jingjing Xu, Jianbo Yuan, Hongxia Yang, Fei Wu, Yang Yang,
Abstract summary: Expert-Token-Routing represents expert LLMs as special expert tokens within the vocabulary of a meta LLM. It supports learning the implicit expertise of expert LLMs from existing instruction dataset. It also conceals the detailed collaboration process from the user's perspective, facilitating interaction as though it were a singular LLM.
Score: 55.25224913110965
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present Expert-Token-Routing, a unified generalist framework that facilitates seamless integration of multiple expert LLMs. Our framework represents expert LLMs as special expert tokens within the vocabulary of a meta LLM. The meta LLM can route to an expert LLM like generating new tokens. Expert-Token-Routing not only supports learning the implicit expertise of expert LLMs from existing instruction dataset but also allows for dynamic extension of new expert LLMs in a plug-and-play manner. It also conceals the detailed collaboration process from the user's perspective, facilitating interaction as though it were a singular LLM. Our framework outperforms various existing multi-LLM collaboration paradigms across benchmarks that incorporate six diverse expert domains, demonstrating effectiveness and robustness in building generalist LLM system via synergizing multiple expert LLMs.

Related papers

BTS: Harmonizing Specialized Experts into a Generalist LLM [52.026293450944635]
Branch-Train-Stitch (BTS) is an efficient training algorithm for combining independently trained large language model (LLM) experts into a single, capable generalist model. Compared to alternative model merging approaches, BTS yields the best generalist performance on a variety of downstream tasks.
arXiv Detail & Related papers (2025-01-31T07:54:34Z)
Scoring with Large Language Models: A Study on Measuring Empathy of Responses in Dialogues [3.2162648244439684]
We develop a framework for investigating how effective Large Language Models are at measuring and scoring empathy of responses in dialogues. Our strategy is to approximate the performance of state-of-the-art and fine-tuned LLMs with explicit and explainable features. Our results show that when only using embeddings, it is possible to achieve performance close to that of generic LLMs.
arXiv Detail & Related papers (2024-12-28T20:37:57Z)
Dynamic Ensemble Reasoning for LLM Experts [35.774197263383996]
We propose a Dynamic Ensemble Reasoning paradigm, called DER, to integrate the strengths of multiple LLM experts conditioned on dynamic inputs. Our method uses fewer computational resources to achieve better performance compared to state-of-the-art baselines.
arXiv Detail & Related papers (2024-12-10T12:05:56Z)
LUK: Empowering Log Understanding with Expert Knowledge from Large Language Models [32.938862271579424]
This paper introduces a novel knowledge enhancement framework, called LUK, which acquires expert knowledge from LLMs to empower log understanding on a smaller PLM. LUK achieves state-of-the-art results on different log analysis tasks and extensive experiments demonstrate expert knowledge from LLMs can be utilized more effectively to understand logs.
arXiv Detail & Related papers (2024-09-03T13:58:34Z)
CCoE: A Compact LLM with Collaboration of Experts [0.6144680854063939]
We propose a framework of easily coupling strong domain experts together to fuse into a big Large Language Model (LLM) We start with 5 experts in the domain of Code, Law, text-to- Math and Medical. The results indicate that our CCoE framework can easily and efficiently boost nearly 10%-20% performance on original base model in different domains but using less resources on training, as well as inference.
arXiv Detail & Related papers (2024-07-16T13:03:58Z)
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts [49.950419707905944]
We present Self-MoE, an approach that transforms a monolithic LLM into a compositional, modular system of self-specialized experts. Our approach leverages self-specialization, which constructs expert modules using self-generated synthetic data. Our findings highlight the critical role of modularity, the applicability of Self-MoE to multiple base LLMs, and the potential of self-improvement in achieving efficient, scalable, and adaptable systems.
arXiv Detail & Related papers (2024-06-17T19:06:54Z)
Large Language Models as Software Components: A Taxonomy for LLM-Integrated Applications [0.0]
Large Language Models (LLMs) have become widely adopted recently. Research explores their use both as autonomous agents and as tools for software engineering. LLMs-integrated applications, on the other hand, are software systems that leverage an LLM to perform tasks that would otherwise be impossible or require significant coding effort. This study provides a taxonomy for LLM-integrated applications, offering a framework for analyzing and describing these systems.
arXiv Detail & Related papers (2024-06-13T21:32:56Z)
Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs) We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM. Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z)
Small LLMs Are Weak Tool Learners: A Multi-LLM Agent [73.54562551341454]
Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs. We propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer. This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability.
arXiv Detail & Related papers (2024-01-14T16:17:07Z)
Video Understanding with Large Language Models: A Survey [97.29126722004949]
Given the remarkable capabilities of large language models (LLMs) in language and multimodal tasks, this survey provides a detailed overview of recent advancements in video understanding. The emergent capabilities Vid-LLMs are surprisingly advanced, particularly their ability for open-ended multi-granularity reasoning. This survey presents a comprehensive study of the tasks, datasets, benchmarks, and evaluation methodologies for Vid-LLMs.
arXiv Detail & Related papers (2023-12-29T01:56:17Z)
Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs [72.49064988035126]
We propose an approach called MKS2, aimed at enhancing multimodal large language models (MLLMs) Specifically, we introduce the Modular Visual Memory, a component integrated into the internal blocks of LLMs, designed to store open-world visual information efficiently. Our experiments demonstrate that MKS2 substantially augments the reasoning capabilities of LLMs in contexts necessitating physical or commonsense knowledge.
arXiv Detail & Related papers (2023-11-27T12:29:20Z)
A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends [30.774685501251817]
General large language models (LLMs) have demonstrated significant potential in tasks such as code generation in software engineering. A considerable portion of Code LLMs is derived from general LLMs through model fine-tuning. There is currently a lack of systematic investigation into Code LLMs and their performance.
arXiv Detail & Related papers (2023-11-17T07:55:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.