UniPT: Universal Parallel Tuning for Transfer Learning with Efficient
Parameter and Memory
- URL: http://arxiv.org/abs/2308.14316v2
- Date: Mon, 11 Mar 2024 10:28:41 GMT
- Title: UniPT: Universal Parallel Tuning for Transfer Learning with Efficient
Parameter and Memory
- Authors: Haiwen Diao, Bo Wan, Ying Zhang, Xu Jia, Huchuan Lu, Long Chen
- Abstract summary: PETL is an effective strategy for adapting pre-trained models to downstream domains.
Recent PETL works focus on the more valuable memory-efficient characteristic.
We propose a new memory-efficient PETL strategy, Universal Parallel Tuning (UniPT)
- Score: 69.33445217944029
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Parameter-efficient transfer learning (PETL), i.e., fine-tuning a small
portion of parameters, is an effective strategy for adapting pre-trained models
to downstream domains. To further reduce the memory demand, recent PETL works
focus on the more valuable memory-efficient characteristic. In this paper, we
argue that the scalability, adaptability, and generalizability of
state-of-the-art methods are hindered by structural dependency and pertinency
on specific pre-trained backbones. To this end, we propose a new
memory-efficient PETL strategy, Universal Parallel Tuning (UniPT), to mitigate
these weaknesses. Specifically, we facilitate the transfer process via a
lightweight and learnable parallel network, which consists of: 1) A parallel
interaction module that decouples the sequential connections and processes the
intermediate activations detachedly from the pre-trained network. 2) A
confidence aggregation module that learns optimal strategies adaptively for
integrating cross-layer features. We evaluate UniPT with different backbones
(e.g., T5, VSE$\infty$, CLIP4Clip, Clip-ViL, and MDETR) on various
vision-and-language and pure NLP tasks. Extensive ablations on 18 datasets have
validated that UniPT can not only dramatically reduce memory consumption and
outperform the best competitor, but also achieve competitive performance over
other plain PETL methods with lower training memory overhead. Our code is
publicly available at: https://github.com/Paranioar/UniPT.
Related papers
- FPT+: A Parameter and Memory Efficient Transfer Learning Method for High-resolution Medical Image Classification [1.5791081894226173]
Fine-grained Prompt Tuning plus (FPT+) is a PETL method designed for high-resolution medical image classification.
FPT+ performs transfer learning by training a lightweight side network and accessing pre-trained knowledge from a large pre-trained model.
Experimental results demonstrate that FPT+ outperforms other PETL methods, using only 1.03% of the learnable parameters and 3.18% of the memory required for fine-tuning an entire ViT-B model.
arXiv Detail & Related papers (2024-08-05T12:33:07Z) - Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences [49.14535254003683]
PaLoRA is a novel parameter-efficient method that augments the original model with task-specific low-rank adapters.
Our experimental results show that PaLoRA outperforms MTL and PFL baselines across various datasets.
arXiv Detail & Related papers (2024-07-10T21:25:51Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - SpaFL: Communication-Efficient Federated Learning with Sparse Models and Low computational Overhead [75.87007729801304]
SpaFL: a communication-efficient FL framework is proposed to optimize sparse model structures with low computational overhead.
Experiments show that SpaFL improves accuracy while requiring much less communication and computing resources compared to sparse baselines.
arXiv Detail & Related papers (2024-06-01T13:10:35Z) - Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone.
We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z) - Make Pre-trained Model Reversible: From Parameter to Memory Efficient
Fine-Tuning [6.451743797015637]
We propose memory-efficient fine-tuning (MEFT) for pre-trained language models.
MEFT inserts adapters into a PLM, preserving the PLM's starting point and making it reversible without additional pre-training.
MEFT significantly reduces the activation memory up to 84% of full fine-tuning with a negligible amount of trainable parameters.
arXiv Detail & Related papers (2023-06-01T09:26:17Z) - LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer
Learning [82.93130407930762]
It is costly to update the entire parameter set of large pre-trained models.
PETL techniques allow updating a small subset of parameters inside a pre-trained backbone network for a new task.
We propose Ladder Side-Tuning (LST), a new PETL technique that reduces training memory requirements by more substantial amounts.
arXiv Detail & Related papers (2022-06-13T23:51:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.