Learning to Decode Collaboratively with Multiple Language Models
- URL: http://arxiv.org/abs/2403.03870v2
- Date: Tue, 27 Aug 2024 08:31:04 GMT
- Title: Learning to Decode Collaboratively with Multiple Language Models
- Authors: Shannon Zejiang Shen, Hunter Lang, Bailin Wang, Yoon Kim, David Sontag,
- Abstract summary: We propose a method to teach multiple large language models (LLM) to collaborate by interleaving their generations at the token level.
Token-level collaboration during decoding allows for a fusion of each model's expertise in a manner tailored to the specific task at hand.
- Score: 37.31339648499042
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a method to teach multiple large language models (LLM) to collaborate by interleaving their generations at the token level. We model the decision of which LLM generates the next token as a latent variable. By optimizing the marginal likelihood of a training set under our latent variable model, the base LLM automatically learns when to generate itself and when to call on one of the ``assistant'' language models to generate, all without direct supervision. Token-level collaboration during decoding allows for a fusion of each model's expertise in a manner tailored to the specific task at hand. Our collaborative decoding is especially useful in cross-domain settings where a generalist base LLM learns to invoke domain expert models. On instruction-following, domain-specific QA, and reasoning tasks, we show that the performance of the joint system exceeds that of the individual models. Through qualitative analysis of the learned latent decisions, we show models trained with our method exhibit several interesting collaboration patterns, e.g., template-filling. Our code is available at https://github.com/clinicalml/co-llm.
Related papers
- Cross-model Control: Improving Multiple Large Language Models in One-time Training [34.98931804630706]
Cross-model Control (CMC) is a method that improves multiple large language models in one-time training.
Based on this insight, we incorporate a tiny language model with a minimal number of parameters.
We propose a novel token mapping strategy named PM-MinED to make this tiny language model applicable to models with different vocabularies.
arXiv Detail & Related papers (2024-10-23T06:52:09Z) - A Model Is Not Built By A Single Prompt: LLM-Based Domain Modeling With Question Decomposition [4.123601037699469]
In real-world domain modeling, engineers usually decompose complex tasks into easily solvable sub-tasks.
We propose an LLM-based domain modeling approach via question decomposition, similar to developer's modeling process.
Preliminary results show that our approach outperforms the single-prompt-based prompt.
arXiv Detail & Related papers (2024-10-13T14:28:04Z) - EmbedLLM: Learning Compact Representations of Large Language Models [28.49433308281983]
We propose EmbedLLM, a framework designed to learn compact vector representations of Large Language Models.
We introduce an encoder-decoder approach for learning such embeddings, along with a systematic framework to evaluate their effectiveness.
Empirical results show that EmbedLLM outperforms prior methods in model routing both in accuracy and latency.
arXiv Detail & Related papers (2024-10-03T05:43:24Z) - VANER: Leveraging Large Language Model for Versatile and Adaptive Biomedical Named Entity Recognition [3.4923338594757674]
Large language models (LLMs) can be used to train a model capable of extracting various types of entities.
In this paper, we utilize the open-sourced LLM LLaMA2 as the backbone model, and design specific instructions to distinguish between different types of entities and datasets.
Our model VANER, trained with a small partition of parameters, significantly outperforms previous LLMs-based models and, for the first time, as a model based on LLM, surpasses the majority of conventional state-of-the-art BioNER systems.
arXiv Detail & Related papers (2024-04-27T09:00:39Z) - Model Composition for Multimodal Large Language Models [71.5729418523411]
We propose a new paradigm through the model composition of existing MLLMs to create a new model that retains the modal understanding capabilities of each original model.
Our basic implementation, NaiveMC, demonstrates the effectiveness of this paradigm by reusing modality encoders and merging LLM parameters.
arXiv Detail & Related papers (2024-02-20T06:38:10Z) - Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs)
We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM.
Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z) - LLM Augmented LLMs: Expanding Capabilities through Composition [56.40953749310957]
CALM -- Composition to Augment Language Models -- introduces cross-attention between models to compose their representations and enable new capabilities.
We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English.
When PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks.
arXiv Detail & Related papers (2024-01-04T18:53:01Z) - CodeGen2: Lessons for Training LLMs on Programming and Natural Languages [116.74407069443895]
We unify encoder and decoder-based models into a single prefix-LM.
For learning methods, we explore the claim of a "free lunch" hypothesis.
For data distributions, the effect of a mixture distribution and multi-epoch training of programming and natural languages on model performance is explored.
arXiv Detail & Related papers (2023-05-03T17:55:25Z) - mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality [95.76661165594884]
mPLUG-Owl is a training paradigm that equips large language models (LLMs) with multi-modal abilities.
The training paradigm involves a two-stage method for aligning image and text, which learns visual knowledge with the assistance of LLM.
Experimental results show that our model outperforms existing multi-modal models.
arXiv Detail & Related papers (2023-04-27T13:27:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.