Cool-Fusion: Fuse Large Language Models without Training
- URL: http://arxiv.org/abs/2407.19807v1
- Date: Mon, 29 Jul 2024 09:02:19 GMT
- Title: Cool-Fusion: Fuse Large Language Models without Training
- Authors: Cong Liu, Xiaojun Quan, Yan Pan, Liang Lin, Weigang Wu, Xu Chen,
- Abstract summary: emphCool-Fusion is a method that does not require any type of training like the ensemble approaches.
emphCool-Fusion increases accuracy from three strong source LLMs by a significant 8%-17.8%.
- Score: 73.17551121242602
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We focus on the problem of fusing two or more heterogeneous large language models (LLMs) to facilitate their complementary strengths. One of the challenges on model fusion is high computational load, i.e. to fine-tune or to align vocabularies via combinatorial optimization. To this end, we propose \emph{Cool-Fusion}, a simple yet effective approach that fuses the knowledge of heterogeneous source LLMs to leverage their complementary strengths. \emph{Cool-Fusion} is the first method that does not require any type of training like the ensemble approaches. But unlike ensemble methods, it is applicable to any set of source LLMs that have different vocabularies. The basic idea is to have each source LLM individually generate tokens until the tokens can be decoded into a text segment that ends at word boundaries common to all source LLMs. Then, the source LLMs jointly rerank the generated text segment and select the best one, which is the fused text generation in one step. Extensive experiments are conducted across a variety of benchmark datasets. On \emph{GSM8K}, \emph{Cool-Fusion} increases accuracy from three strong source LLMs by a significant 8\%-17.8\%.
Related papers
- LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity [7.945893812374361]
We introduce the focal diversity metric to capture the diversity-performance correlation among component LLMs of an ensemble.
We develop a diversity-optimized ensemble pruning algorithm to select the top-k sub-ensembles from a pool of $N$ base LLMs.
Our pruning method recommends top-performing LLM subensembles of size $S$, often much smaller than $N$.
arXiv Detail & Related papers (2024-10-04T22:31:15Z) - Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement [72.97553348776425]
We make a pioneering effort to broaden the applicability of merging techniques from FT to PT LLMs.
We introduce an approach based on WeIght DisENtanglement (WIDEN) to effectively extend the merging scope.
We merge Qwen1.5-Chat (an FT LLM with instruction-following skills) with Sailor (a PT LLM with multilingual abilities) across 7B and 14B model scales.
arXiv Detail & Related papers (2024-08-06T10:46:46Z) - LLMEmbed: Rethinking Lightweight LLM's Genuine Function in Text Classification [13.319594321038926]
We propose a simple and effective transfer learning strategy, namely LLMEmbed, to address this classical but challenging task.
We perform extensive experiments on publicly available datasets, and the results show that LLMEmbed achieves strong performance while enjoys low training overhead.
arXiv Detail & Related papers (2024-06-06T03:46:59Z) - Text-like Encoding of Collaborative Information in Large Language Models for Recommendation [58.87865271693269]
We introduce BinLLM, a novel method to seamlessly integrate collaborative information with Large Language Models for Recommendation (LLMRec)
BinLLM converts collaborative embeddings from external models into binary sequences.
BinLLM provides options to compress the binary sequence using dot-decimal notation to avoid excessively long lengths.
arXiv Detail & Related papers (2024-06-05T12:45:25Z) - Generative Text Steganography with Large Language Model [10.572149957139736]
Black-box generative text steganographic method based on user interfaces of large language models, which is called LLM-Stega.
We first construct a keyword set and design a new encrypted steganographic mapping to embed secret messages.
Comprehensive experiments demonstrate that the proposed LLM-Stega outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-04-16T02:19:28Z) - Bridging the Gap between Different Vocabularies for LLM Ensemble [10.669552498083709]
vocabulary discrepancies among various large language models (LLMs) have constrained previous studies.
We propose a novel method to Ensemble LLMs via Vocabulary Alignment (EVA)
EVA bridges the lexical gap among various LLMs, enabling meticulous ensemble at each generation step.
arXiv Detail & Related papers (2024-04-15T06:28:20Z) - Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs)
We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM.
Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z) - Boosting Large Language Model for Speech Synthesis: An Empirical Study [86.89548753080432]
Large language models (LLMs) have made significant advancements in natural language processing and are concurrently extending the language ability to other modalities, such as speech and vision.
We conduct a comprehensive empirical exploration of boosting LLMs with the ability to generate speech, by combining pre-trained LLM LLaMA/OPT and text-to-speech synthesis model VALL-E.
We compare three integration methods between LLMs and speech models, including directly fine-tuned LLMs, superposed layers of LLMs and VALL-E, and coupled LLMs and VALL-E using LLMs as a powerful text encoder
arXiv Detail & Related papers (2023-12-30T14:20:04Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.