Related papers: Parameter-Efficient Transfer Learning with Diff Pruning

Parameter-Efficient Transfer Learning with Diff Pruning

URL: http://arxiv.org/abs/2012.07463v1
Date: Mon, 14 Dec 2020 12:34:01 GMT
Title: Parameter-Efficient Transfer Learning with Diff Pruning
Authors: Demi Guo, Alexander M. Rush, Yoon Kim
Abstract summary: diff pruning is a simple approach to enable parameter-efficient transfer learning within the pretrain-finetune framework. We find that models finetuned with diff pruning can match the performance of fully finetuned baselines on the GLUE benchmark.
Score: 108.03864629388404
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While task-specific finetuning of pretrained networks has led to significant empirical advances in NLP, the large size of networks makes finetuning difficult to deploy in multi-task, memory-constrained settings. We propose diff pruning as a simple approach to enable parameter-efficient transfer learning within the pretrain-finetune framework. This approach views finetuning as learning a task-specific diff vector that is applied on top of the pretrained parameter vector, which remains fixed and is shared across different tasks. The diff vector is adaptively pruned during training with a differentiable approximation to the L0-norm penalty to encourage sparsity. Diff pruning becomes parameter-efficient as the number of tasks increases, as it requires storing only the nonzero positions and weights of the diff vector for each task, while the cost of storing the shared pretrained model remains constant. It further does not require access to all tasks during training, which makes it attractive in settings where tasks arrive in stream or the set of tasks is unknown. We find that models finetuned with diff pruning can match the performance of fully finetuned baselines on the GLUE benchmark while only modifying 0.5% of the pretrained model's parameters per task.

Related papers

Cross-Model Transfer of Task Vectors via Few-Shot Orthogonal Alignment [5.2980803808373516]
Task arithmetic enables efficient model editing by representing task-specific changes as vectors in parameter space.<n>This assumption limits its applicability in cross-model transfer settings, where models are independently pre-trained on different datasets.<n>We propose a method based on few-shot alignment, which aligns task vectors to the parameter space of a differently pre-trained target model.
arXiv Detail & Related papers (2025-05-17T14:24:06Z)
Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters. We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model. Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z)
Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis [51.14136878142034]
Point cloud analysis has achieved outstanding performance by transferring point cloud pre-trained models. Existing methods for model adaptation usually update all model parameters, which is inefficient as it relies on high computational costs. In this paper, we aim to study parameter-efficient transfer learning for point cloud analysis with an ideal trade-off between task performance and parameter efficiency.
arXiv Detail & Related papers (2024-03-03T08:25:04Z)
Task Difficulty Aware Parameter Allocation & Regularization for Lifelong Learning [20.177260510548535]
We propose the Allocation & Regularization (PAR), which adaptively select an appropriate strategy for each task from parameter allocation and regularization based on its learning difficulty. Our method is scalable and significantly reduces the model's redundancy while improving the model's performance.
arXiv Detail & Related papers (2023-04-11T15:38:21Z)
Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks [36.34331439747556]
We propose Polyhistor and Polyhistor-Lite to share information across different tasks with a few trainable parameters. Specifically, Polyhistor achieves competitive accuracy compared to the state-of-the-art while only using 10% of their trainable parameters.
arXiv Detail & Related papers (2022-10-07T00:25:02Z)
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z)
Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers. Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters. We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z)
Instance-Level Task Parameters: A Robust Multi-task Weighting Framework [17.639472693362926]
Recent works have shown that deep neural networks benefit from multi-task learning by learning a shared representation across several related tasks. We let the training process dictate the optimal weighting of tasks for every instance in the dataset. We conduct extensive experiments on SURREAL and CityScapes datasets, for human shape and pose estimation, depth estimation and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-11T02:35:42Z)
Parameter-Efficient Transfer from Sequential Behaviors for User Modeling and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec. PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks. We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.