Mutual Enhancement of Large and Small Language Models with Cross-Silo
Knowledge Transfer
- URL: http://arxiv.org/abs/2312.05842v1
- Date: Sun, 10 Dec 2023 09:52:32 GMT
- Title: Mutual Enhancement of Large and Small Language Models with Cross-Silo
Knowledge Transfer
- Authors: Yongheng Deng, Ziqing Qiao, Ju Ren, Yang Liu, Yaoxue Zhang
- Abstract summary: Large language models (LLMs) are empowered with broad knowledge, but their task-specific performance is often suboptimal.
It necessitates fine-tuning LLMs with task-specific data, but such data may be inaccessible due to privacy concerns.
We propose a novel approach to enhance LLMs with smaller language models (SLMs) that are trained on clients using their private task-specific data.
- Score: 27.63746419563747
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While large language models (LLMs) are empowered with broad knowledge, their
task-specific performance is often suboptimal. It necessitates fine-tuning LLMs
with task-specific data, but such data may be inaccessible due to privacy
concerns. In this paper, we propose a novel approach to enhance LLMs with
smaller language models (SLMs) that are trained on clients using their private
task-specific data. To enable mutual enhancement between LLMs and SLMs, we
propose CrossLM, where the SLMs promote the LLM to generate task-specific
high-quality data, and both the LLM and SLMs are enhanced with the generated
data. We evaluate CrossLM using publicly accessible language models across a
range of benchmark tasks. The results demonstrate that CrossLM significantly
enhances the task-specific performance of SLMs on clients and the LLM on the
cloud server simultaneously while preserving the LLM's generalization
capability.
Related papers
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective [53.48484062444108]
We find that the development of models and data is not two separate paths but rather interconnected.
On the one hand, vaster and higher-quality data contribute to better performance of MLLMs; on the other hand, MLLMs can facilitate the development of data.
To promote the data-model co-development for MLLM community, we systematically review existing works related to MLLMs from the data-model co-development perspective.
arXiv Detail & Related papers (2024-07-11T15:08:11Z) - FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language Models [28.284346666217207]
FedMKT is a parameter-efficient mutual knowledge transfer framework for large and small language models.
We show that FedMKT simultaneously boosts the performance of both LLMs and SLMs.
arXiv Detail & Related papers (2024-06-04T11:36:09Z) - Parrot: Efficient Serving of LLM-based Applications with Semantic Variable [11.894203842968745]
Parrot is a service system that focuses on the end-to-end experience of LLM-based applications.
A Semantic Variable annotates an input/output variable in the prompt of a request, and creates the data pipeline when connecting multiple LLM requests.
arXiv Detail & Related papers (2024-05-30T09:46:36Z) - Federated Domain-Specific Knowledge Transfer on Large Language Models Using Synthetic Data [53.70870879858533]
We introduce a Federated Domain-specific Knowledge Transfer framework.
It enables domain-specific knowledge transfer from LLMs to SLMs while preserving clients' data privacy.
The proposed FDKT framework consistently and greatly improves SLMs' task performance by around 5% with a privacy budget of less than 10.
arXiv Detail & Related papers (2024-05-23T06:14:35Z) - Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs)
We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM.
Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z) - Augmented Large Language Models with Parametric Knowledge Guiding [72.71468058502228]
Large Language Models (LLMs) have significantly advanced natural language processing (NLP) with their impressive language understanding and generation capabilities.
Their performance may be suboptimal for domain-specific tasks that require specialized knowledge due to limited exposure to the related data.
We propose the novel Parametric Knowledge Guiding (PKG) framework, which equips LLMs with a knowledge-guiding module to access relevant knowledge.
arXiv Detail & Related papers (2023-05-08T15:05:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.