CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without
Full Large Language Model
- URL: http://arxiv.org/abs/2310.15477v1
- Date: Tue, 24 Oct 2023 03:08:58 GMT
- Title: CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without
Full Large Language Model
- Authors: Kaiyan Zhang, Ning Ding, Biqing Qi, Xuekai Zhu, Xinwei Long, Bowen
Zhou
- Abstract summary: This paper focuses on Offsite-Tuning (OFT), a representative technique that transfers transformer blocks between centralized LLMs and downstream emulators.
Inspired by these observations, we propose CRaSh, involving Clustering, Removing, and Sharing, a training-free strategy to derive improved emulators from LLMs.
Our findings demonstrate a linear connectivity among these optima falling over the same basin, thereby highlighting the effectiveness of CRaSh and OFT.
- Score: 22.870512676002463
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Instruction tuning has recently been recognized as an effective way of
aligning Large Language Models (LLMs) to enhance their generalization ability
across various tasks. However, when tuning publicly accessible, centralized
LLMs with private instruction data, privacy concerns are inevitable. While
direct transfer of parameterized modules between models is a plausible approach
to address this, its implications and effectiveness need further exploration.
This paper focuses on Offsite-Tuning (OFT), a representative technique that
transfers transformer blocks between centralized LLMs and downstream emulators.
Given the limited understanding of the underlying mechanism of OFT, we perform
an empirical analysis on LLMs from the perspectives of representation and
functional similarity. Interestingly, our findings reveal a unique modular
structure within the layers of LLMs that appears to emerge as the model size
expands. Simultaneously, we note subtle but potentially significant changes in
representation and intermediate predictions across the layers. Inspired by
these observations, we propose CRaSh, involving Clustering, Removing, and
Sharing, a training-free strategy to derive improved emulators from LLMs. CRaSh
significantly boosts performance of OFT with billions of parameters.
Furthermore, we investigate the optimal solutions yielded by fine-tuning with
and without full model through the lens of loss landscape. Our findings
demonstrate a linear connectivity among these optima falling over the same
basin, thereby highlighting the effectiveness of CRaSh and OFT. The source code
is publicly available at https://github.com/TsinghuaC3I/CRaSh.
Related papers
- R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models [83.77114091471822]
Split federated learning (SFL) is a compute-efficient paradigm in distributed machine learning (ML)
A challenge in SFL, particularly when deployed over wireless channels, is the susceptibility of transmitted model parameters to adversarial jamming.
This is particularly pronounced for word embedding parameters in large language models (LLMs), which are crucial for language understanding.
A physical layer framework is developed for resilient SFL with LLMs (R-SFLLM) over wireless networks.
arXiv Detail & Related papers (2024-07-16T12:21:29Z) - PAFT: A Parallel Training Paradigm for Effective LLM Fine-Tuning [17.73193523921637]
Large language models (LLMs) have shown remarkable abilities in diverse natural language processing (NLP) tasks.
LLMs generally undergo supervised fine-tuning (SFT) followed by preference alignment to be usable in downstream applications.
This paper introduces PAFT, a new PArallel training paradigm for effective LLM Fine-Tuning.
arXiv Detail & Related papers (2024-06-25T20:11:37Z) - Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models [42.891427362223176]
Large language models (LLMs) based on decoder-only transformers have demonstrated superior text understanding capabilities.
We propose a novel framework to fully harness the capabilities of LLMs.
We further design an LLM-Infused Diffusion Transformer (LI-DiT) based on the framework.
arXiv Detail & Related papers (2024-06-17T17:59:43Z) - Surgical Feature-Space Decomposition of LLMs: Why, When and How? [8.826164604720738]
We empirically study the efficacy of weight and feature space decomposition in transformer-based language models.
We show that surgical decomposition provides critical insights into the trade-off between compression and language modelling performance.
We extend our investigation to the implications of low-rank approximations on model bias.
arXiv Detail & Related papers (2024-05-17T07:34:03Z) - Causal Prompting: Debiasing Large Language Model Prompting based on Front-Door Adjustment [32.12998469814097]
A novel causal prompting method based on front-door adjustment is proposed to effectively mitigate Large Language Models (LLMs) biases.
Experimental results show that the proposed causal prompting approach achieves excellent performance across seven natural language processing datasets.
arXiv Detail & Related papers (2024-03-05T07:47:34Z) - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [52.98743860365194]
We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN)
At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself.
This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
arXiv Detail & Related papers (2024-01-02T18:53:13Z) - Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes [53.4856038354195]
Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions.
FedKSeed employs zeroth-order optimization with a finite set of random seeds.
It significantly reduces transmission requirements between the server and clients to just a few random seeds.
arXiv Detail & Related papers (2023-12-11T13:03:21Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large
Language Models in Federated Learning [70.38817963253034]
This paper first discusses these challenges of federated fine-tuning LLMs, and introduces our package FS-LLM as a main contribution.
We provide comprehensive federated parameter-efficient fine-tuning algorithm implementations and versatile programming interfaces for future extension in FL scenarios.
We conduct extensive experiments to validate the effectiveness of FS-LLM and benchmark advanced LLMs with state-of-the-art parameter-efficient fine-tuning algorithms in FL settings.
arXiv Detail & Related papers (2023-09-01T09:40:36Z) - Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM
Inference Pipeline [22.08897444328099]
Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks.
In this paper, we propose an efficient LLM inference pipeline that harnesses the power of LLMs.
arXiv Detail & Related papers (2023-05-22T15:36:06Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.