Related papers: Knowledge Fusion of Chat LLMs: A Preliminary Technical Report

Knowledge Fusion of Chat LLMs: A Preliminary Technical Report

URL: http://arxiv.org/abs/2402.16107v5
Date: Tue, 28 May 2024 09:59:16 GMT
Title: Knowledge Fusion of Chat LLMs: A Preliminary Technical Report
Authors: Fanqi Wan, Ziyi Yang, Longguang Zhong, Xiaojun Quan, Xinting Huang, Wei Bi,
Abstract summary: We extend the FuseLLM framework to realize the fusion of chat LLMs, resulting in FusionChat. We undertake knowledge fusion for structurally and scale-varied source LLMs to derive multiple target LLMs of identical structure and size via lightweight fine-tuning. We validate our approach using three prominent chat LLMs with diverse architectures and scales, namely NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B.
Score: 51.0178356903925
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, FuseLLM introduced the concept of knowledge fusion to transfer the collective knowledge of multiple structurally varied LLMs into a target LLM through lightweight continual training. In this report, we extend the scalability and flexibility of the FuseLLM framework to realize the fusion of chat LLMs, resulting in FusionChat. FusionChat comprises two main stages. Firstly, we undertake knowledge fusion for structurally and scale-varied source LLMs to derive multiple target LLMs of identical structure and size via lightweight fine-tuning. Then, these target LLMs are merged within the parameter space, wherein we propose a novel method for determining the merging weights based on the variation ratio of parameter matrices before and after fine-tuning. We validate our approach using three prominent chat LLMs with diverse architectures and scales, namely NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B. Experimental results spanning various chat domains demonstrate the superiority of FusionChat-7B across a broad spectrum of chat LLMs at 7B and 34B scales, even surpassing GPT-3.5 (March) and approaching Mixtral-8x7B-Instruct.

Related papers

LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression. LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model. Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z)
FedCoLLM: A Parameter-Efficient Federated Co-tuning Framework for Large and Small Language Models [24.579015114518157]
FedCoLLM is a novel framework designed for co-tuning Large Language Models (LLMs) and Small Language Models (SLMs) FedCoLLM adaptively transfers server-side LLMs knowledge to clients' SLMs while simultaneously enriching the LLMs with domain insights from the clients. Our evaluation of FedCoLLM, utilizing various public LLMs and SLMs across a range of NLP text generation tasks, reveals notable improvements with the assistance of the LLMs.
arXiv Detail & Related papers (2024-11-18T16:34:58Z)
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models [70.19607283302712]
We propose a novel framework to transfer knowledge from l-MLLM to s-MLLM. Specifically, we introduce Multimodal Distillation (MDist) to minimize the divergence between the visual-textual output distributions of l-MLLM and s-MLLM. We also propose a three-stage training scheme to fully exploit the potential of s-MLLM.
arXiv Detail & Related papers (2024-10-21T17:41:28Z)
FuseChat: Knowledge Fusion of Chat Models [35.90957231731829]
We propose a new framework for the knowledge fusion of chat LLMs through two main stages, resulting in FuseChat. We implement and validate FuseChat using six prominent chat LLMs with diverse architectures and scales, including OpenChat-3.5-7B, Starling-LM-7B-alpha, NH2-SOLAR-10.7B, InternLM2-Chat-20B, Mixtral-8x7B-Instruct, and Qwen-1.5-Chat-72B.
arXiv Detail & Related papers (2024-08-15T07:37:24Z)
Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement [72.97553348776425]
We make a pioneering effort to broaden the applicability of merging techniques from FT to PT LLMs. We introduce an approach based on WeIght DisENtanglement (WIDEN) to effectively extend the merging scope. We merge Qwen1.5-Chat (an FT LLM with instruction-following skills) with Sailor (a PT LLM with multilingual abilities) across 7B and 14B model scales.
arXiv Detail & Related papers (2024-08-06T10:46:46Z)
Cool-Fusion: Fuse Large Language Models without Training [73.17551121242602]
Cool-Fusion fuses the knowledge of source LLMs, which does not require training.<n>Experiments have been conducted across a variety of benchmark datasets.<n>On GSM8K, Cool-Fusion increases accuracy from three strong source LLMs by a significant margin of 17.4%.
arXiv Detail & Related papers (2024-07-29T09:02:19Z)
Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models [79.46938238953916]
Fine-tuning large language models (LLMs) to diverse applications is crucial to meet complex demands. Recent studies suggest decomposing a fine-tuned LLM into a base model and corresponding delta weights, which are then compressed using low-rank or low-bit approaches to reduce costs. In this work, we observe that existing low-rank and low-bit compression methods can significantly harm the model performance for task-specific fine-tuned LLMs.
arXiv Detail & Related papers (2024-06-13T07:57:27Z)
FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language Models [28.284346666217207]
FedMKT is a parameter-efficient mutual knowledge transfer framework for large and small language models. We show that FedMKT simultaneously boosts the performance of both LLMs and SLMs.
arXiv Detail & Related papers (2024-06-04T11:36:09Z)
Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs) We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM. Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z)
Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs [72.49064988035126]
We propose an approach called MKS2, aimed at enhancing multimodal large language models (MLLMs) Specifically, we introduce the Modular Visual Memory, a component integrated into the internal blocks of LLMs, designed to store open-world visual information efficiently. Our experiments demonstrate that MKS2 substantially augments the reasoning capabilities of LLMs in contexts necessitating physical or commonsense knowledge.
arXiv Detail & Related papers (2023-11-27T12:29:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.