Fine-tuning Happens in Tiny Subspaces: Exploring Intrinsic Task-specific
Subspaces of Pre-trained Language Models
- URL: http://arxiv.org/abs/2305.17446v2
- Date: Tue, 1 Aug 2023 08:54:06 GMT
- Title: Fine-tuning Happens in Tiny Subspaces: Exploring Intrinsic Task-specific
Subspaces of Pre-trained Language Models
- Authors: Zhong Zhang, Bang Liu, Junming Shao
- Abstract summary: Pre-trained language models (PLMs) are known to be overly parameterized and have significant redundancy.
We study the problem of re- parameterizing and fine-tuning PLMs from a new perspective: Discovery of intrinsic task-specific subspace.
A key finding is that PLMs can be effectively fine-tuned in the subspace with a small number of free parameters.
- Score: 16.28794184086409
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained language models (PLMs) are known to be overly parameterized and
have significant redundancy, indicating a small degree of freedom of the PLMs.
Motivated by the observation, in this paper, we study the problem of
re-parameterizing and fine-tuning PLMs from a new perspective: Discovery of
intrinsic task-specific subspace. Specifically, by exploiting the dynamics of
the fine-tuning process for a given task, the parameter optimization trajectory
is learned to uncover its intrinsic task-specific subspace. A key finding is
that PLMs can be effectively fine-tuned in the subspace with a small number of
free parameters. Beyond, we observe some outlier dimensions emerging during
fine-tuning in the subspace. Disabling these dimensions degrades the model
performance significantly. This suggests that these dimensions are crucial to
induce task-specific knowledge to downstream tasks.
Related papers
- Beyond Task Vectors: Selective Task Arithmetic Based on Importance Metrics [0.0]
This paper introduces textbfunderlineSelective textbfunderlineTask textbfunderlineArithmetic underlinetextbf(STA), a training-free framework designed to enhance multi-task performance through task-specific parameter fusion.
Experimental results demonstrate that STA achieves superior multi-task performance across benchmarks and excellent performance in task forgetting.
arXiv Detail & Related papers (2024-11-25T06:59:16Z) - Propulsion: Steering LLM with Tiny Fine-Tuning [0.0]
We propose Propulsion, a novel parameter efficient fine-tuning (PEFT) method to optimize task-specific performance.
Inspired by the concept of controlled adjustments in physical motion, Propulsion selectively re-scales specific dimensions of a pre-trained model.
Our theoretical analysis, supported by Neural Tangent Kernel (NTK) theory, shows that Propulsion approximates the performance of full fine-tuning with far fewer trainable parameters.
arXiv Detail & Related papers (2024-09-17T06:51:59Z) - Unleashing the Power of Task-Specific Directions in Parameter Efficient Fine-tuning [65.31677646659895]
This paper focuses on the concept of task-specific directions (TSDs)-critical for transitioning large models from pretrained states to task-specific enhancements in PEFT.
We introduce a novel approach, LoRA-Dash, which aims to maximize the impact of TSDs during the fine-tuning process, thereby enhancing model performance on targeted tasks.
arXiv Detail & Related papers (2024-09-02T08:10:51Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained
Models [96.9373147383119]
We show that weight disentanglement is the crucial factor that makes task arithmetic effective.
We show that fine-tuning models in their tangent space by linearizing them amplifies weight disentanglement.
This leads to substantial performance improvements across task arithmetic benchmarks and diverse models.
arXiv Detail & Related papers (2023-05-22T08:39:25Z) - PATS: Sensitivity-aware Noisy Learning for Pretrained Language Models [29.140036130469042]
We present PATS (Perturbation According To Sensitivity), a noisy training mechanism which considers each parameter's importance in the downstream task.
Experiments conducted on different tasks of the GLUE benchmark show PATS can consistently empower the fine-tuning of different sizes of PLMs.
arXiv Detail & Related papers (2022-10-22T10:05:14Z) - Exploring Low-dimensional Intrinsic Task Subspace via Prompt Tuning [70.76016793057283]
In this work, we study how pre-trained language models (PLMs) learn universal representations and effectively adapt to broad NLP tasks differing a lot.
In experiments, we study diverse few-shot NLP tasks and surprisingly find that in a 5-dimensional subspace found with 100 random tasks, by only tuning 5 free parameters, we can recover 87% and 65% of the full prompt tuning performance.
arXiv Detail & Related papers (2021-10-15T05:43:59Z) - A local approach to parameter space reduction for regression and
classification tasks [0.0]
We propose a new method called local active subspaces (LAS), which explores the synergies of active subspaces with supervised clustering techniques.
LAS is particularly useful for the community working on surrogate modelling.
arXiv Detail & Related papers (2021-07-22T18:06:04Z) - Intrinsic Dimensionality Explains the Effectiveness of Language Model
Fine-Tuning [52.624194343095304]
We argue that analyzing fine-tuning through the lens of intrinsic dimension provides us with empirical and theoretical intuitions.
We empirically show that common pre-trained models have a very low intrinsic dimension.
arXiv Detail & Related papers (2020-12-22T07:42:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.