Cool-Fusion: Fuse Large Language Models without Training
- URL: http://arxiv.org/abs/2407.19807v2
- Date: Mon, 09 Jun 2025 09:19:19 GMT
- Title: Cool-Fusion: Fuse Large Language Models without Training
- Authors: Cong Liu, Xiaojun Quan, Yan Pan, Liang Lin, Weigang Wu, Xu Chen,
- Abstract summary: Cool-Fusion fuses the knowledge of source LLMs, which does not require training.<n>Experiments have been conducted across a variety of benchmark datasets.<n>On GSM8K, Cool-Fusion increases accuracy from three strong source LLMs by a significant margin of 17.4%.
- Score: 73.17551121242602
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We focus on the problem of fusing two or more heterogeneous large language models (LLMs) to leverage their complementary strengths. One of the challenges of model fusion is high computational load, specifically in fine-tuning or aligning vocabularies. To address this, we propose Cool-Fusion, a simple yet effective approach that fuses the knowledge of source LLMs, which does not require training. Unlike ensemble methods, Cool-Fusion is applicable to any set of source LLMs that have different vocabularies. To overcome the vocabulary discrepancies among LLMs, we ensemble LLMs on text level, allowing them to rerank the generated texts by each other with different granularities. Extensive experiments have been conducted across a variety of benchmark datasets. On GSM8K, Cool-Fusion increases accuracy from three strong source LLMs by a significant margin of 17.4\%.
Related papers
- Improving LLM-based Document-level Machine Translation with Multi-Knowledge Fusion [21.533772761328656]
We propose an enhanced approach by incorporating multiple sources of knowledge, including both the document summarization and entity translation.
Our approach achieves an average improvement of 0.8, 0.6, and 0.4 COMET scores over the baseline without extra knowledge.
arXiv Detail & Related papers (2025-03-15T14:18:45Z) - Ensemble Learning for Large Language Models in Text and Code Generation: A Survey [6.041894045506043]
We focus on four methods and models that show strong performance and potential for broader applications.
These include better representation of diversity, improved output quality, and greater flexibility in applications.
arXiv Detail & Related papers (2025-03-13T18:50:57Z) - Weighted-Reward Preference Optimization for Implicit Model Fusion [35.57286356489511]
We propose an implicit fusion method, which leverages preference optimization between the source LLMs and the target LLM to transfer their capabilities effectively.<n>WRPO eliminates the need for vocabulary alignment and matrix fusion and can be efficiently scaled to accommodate various LLMs.<n>Experiments on the MT-Bench, AlpacaEval-2, and Arena-Hard benchmarks demonstrate that WRPO consistently outperforms existing knowledge fusion methods.
arXiv Detail & Related papers (2024-12-04T10:15:12Z) - $H^3$Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs [7.498844064516196]
Alignment of pretrained LLMs using instruction-based datasets is critical for creating fine-tuned models that reflect human preference.
This paper develops an alignment fusion approach, coined as $H3$Fusion, with three unique characteristics.
It outperforms each individually aligned model by $11.37%$, and it provides stronger robustness compared to the state-of-the-art LLM ensemble approaches by $13.77%$.
arXiv Detail & Related papers (2024-11-26T17:42:38Z) - LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity [7.945893812374361]
We introduce the focal diversity metric to capture the diversity-performance correlation among component LLMs of an ensemble.
We develop a diversity-optimized ensemble pruning algorithm to select the top-k sub-ensembles from a pool of $N$ base LLMs.
Our pruning method recommends top-performing LLM subensembles of size $S$, often much smaller than $N$.
arXiv Detail & Related papers (2024-10-04T22:31:15Z) - Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement [72.97553348776425]
We make a pioneering effort to broaden the applicability of merging techniques from FT to PT LLMs.
We introduce an approach based on WeIght DisENtanglement (WIDEN) to effectively extend the merging scope.
We merge Qwen1.5-Chat (an FT LLM with instruction-following skills) with Sailor (a PT LLM with multilingual abilities) across 7B and 14B model scales.
arXiv Detail & Related papers (2024-08-06T10:46:46Z) - LLMEmbed: Rethinking Lightweight LLM's Genuine Function in Text Classification [13.319594321038926]
We propose a simple and effective transfer learning strategy, namely LLMEmbed, to address this classical but challenging task.
We perform extensive experiments on publicly available datasets, and the results show that LLMEmbed achieves strong performance while enjoys low training overhead.
arXiv Detail & Related papers (2024-06-06T03:46:59Z) - Text-like Encoding of Collaborative Information in Large Language Models for Recommendation [58.87865271693269]
We introduce BinLLM, a novel method to seamlessly integrate collaborative information with Large Language Models for Recommendation (LLMRec)
BinLLM converts collaborative embeddings from external models into binary sequences.
BinLLM provides options to compress the binary sequence using dot-decimal notation to avoid excessively long lengths.
arXiv Detail & Related papers (2024-06-05T12:45:25Z) - Generative Text Steganography with Large Language Model [10.572149957139736]
Black-box generative text steganographic method based on user interfaces of large language models, which is called LLM-Stega.
We first construct a keyword set and design a new encrypted steganographic mapping to embed secret messages.
Comprehensive experiments demonstrate that the proposed LLM-Stega outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-04-16T02:19:28Z) - Bridging the Gap between Different Vocabularies for LLM Ensemble [10.669552498083709]
vocabulary discrepancies among various large language models (LLMs) have constrained previous studies.
We propose a novel method to Ensemble LLMs via Vocabulary Alignment (EVA)
EVA bridges the lexical gap among various LLMs, enabling meticulous ensemble at each generation step.
arXiv Detail & Related papers (2024-04-15T06:28:20Z) - Knowledge Fusion of Chat LLMs: A Preliminary Technical Report [51.0178356903925]
We extend the FuseLLM framework to realize the fusion of chat LLMs, resulting in FusionChat.
We undertake knowledge fusion for structurally and scale-varied source LLMs to derive multiple target LLMs of identical structure and size via lightweight fine-tuning.
We validate our approach using three prominent chat LLMs with diverse architectures and scales, namely NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B.
arXiv Detail & Related papers (2024-02-25T15:11:58Z) - Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs)
We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM.
Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z) - Boosting Large Language Model for Speech Synthesis: An Empirical Study [86.89548753080432]
Large language models (LLMs) have made significant advancements in natural language processing and are concurrently extending the language ability to other modalities, such as speech and vision.
We conduct a comprehensive empirical exploration of boosting LLMs with the ability to generate speech, by combining pre-trained LLM LLaMA/OPT and text-to-speech synthesis model VALL-E.
We compare three integration methods between LLMs and speech models, including directly fine-tuned LLMs, superposed layers of LLMs and VALL-E, and coupled LLMs and VALL-E using LLMs as a powerful text encoder
arXiv Detail & Related papers (2023-12-30T14:20:04Z) - The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics [74.99898531299148]
This research examines vocabulary trimming (VT) inspired by restricting embedding entries to the language of interest to bolster time and memory efficiency.
We apply two languages to trim the full vocabulary - Unicode-based script filtering and corpus-based selection - to different language families and sizes.
It is found that VT reduces the memory usage of small models by nearly 50% and has an upper bound of 25% improvement in generation speed.
arXiv Detail & Related papers (2023-11-16T09:35:50Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.