UniPT: Universal Parallel Tuning for Transfer Learning with Efficient
Parameter and Memory
- URL: http://arxiv.org/abs/2308.14316v2
- Date: Mon, 11 Mar 2024 10:28:41 GMT
- Title: UniPT: Universal Parallel Tuning for Transfer Learning with Efficient
Parameter and Memory
- Authors: Haiwen Diao, Bo Wan, Ying Zhang, Xu Jia, Huchuan Lu, Long Chen
- Abstract summary: PETL is an effective strategy for adapting pre-trained models to downstream domains.
Recent PETL works focus on the more valuable memory-efficient characteristic.
We propose a new memory-efficient PETL strategy, Universal Parallel Tuning (UniPT)
- Score: 69.33445217944029
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Parameter-efficient transfer learning (PETL), i.e., fine-tuning a small
portion of parameters, is an effective strategy for adapting pre-trained models
to downstream domains. To further reduce the memory demand, recent PETL works
focus on the more valuable memory-efficient characteristic. In this paper, we
argue that the scalability, adaptability, and generalizability of
state-of-the-art methods are hindered by structural dependency and pertinency
on specific pre-trained backbones. To this end, we propose a new
memory-efficient PETL strategy, Universal Parallel Tuning (UniPT), to mitigate
these weaknesses. Specifically, we facilitate the transfer process via a
lightweight and learnable parallel network, which consists of: 1) A parallel
interaction module that decouples the sequential connections and processes the
intermediate activations detachedly from the pre-trained network. 2) A
confidence aggregation module that learns optimal strategies adaptively for
integrating cross-layer features. We evaluate UniPT with different backbones
(e.g., T5, VSE$\infty$, CLIP4Clip, Clip-ViL, and MDETR) on various
vision-and-language and pure NLP tasks. Extensive ablations on 18 datasets have
validated that UniPT can not only dramatically reduce memory consumption and
outperform the best competitor, but also achieve competitive performance over
other plain PETL methods with lower training memory overhead. Our code is
publicly available at: https://github.com/Paranioar/UniPT.
Related papers
- MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training [78.93900796545523]
Mini-Sequence Transformer (MsT) is a methodology for highly efficient and accurate LLM training with extremely long sequences.
MsT partitions input sequences and iteratively processes mini-sequences to reduce intermediate memory usage.
arXiv Detail & Related papers (2024-07-22T01:52:30Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - SpaFL: Communication-Efficient Federated Learning with Sparse Models and Low computational Overhead [75.87007729801304]
SpaFL: a communication-efficient FL framework is proposed to optimize sparse model structures with low computational overhead.
Experiments show that SpaFL improves accuracy while requiring much less communication and computing resources compared to sparse baselines.
arXiv Detail & Related papers (2024-06-01T13:10:35Z) - Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone.
We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z) - RTP: Rethinking Tensor Parallelism with Memory Deduplication [3.036340414461332]
Rotated Parallelism (RTP) is an innovative approach that focuses on memory deduplication in distributed training environments.
Our empirical evaluations underscore RTP's efficiency, revealing that its memory consumption during distributed system training is remarkably close to the optimal.
arXiv Detail & Related papers (2023-11-02T23:12:42Z) - Make Pre-trained Model Reversible: From Parameter to Memory Efficient
Fine-Tuning [6.451743797015637]
We propose memory-efficient fine-tuning (MEFT) for pre-trained language models.
MEFT inserts adapters into a PLM, preserving the PLM's starting point and making it reversible without additional pre-training.
MEFT significantly reduces the activation memory up to 84% of full fine-tuning with a negligible amount of trainable parameters.
arXiv Detail & Related papers (2023-06-01T09:26:17Z) - LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer
Learning [82.93130407930762]
It is costly to update the entire parameter set of large pre-trained models.
PETL techniques allow updating a small subset of parameters inside a pre-trained backbone network for a new task.
We propose Ladder Side-Tuning (LST), a new PETL technique that reduces training memory requirements by more substantial amounts.
arXiv Detail & Related papers (2022-06-13T23:51:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.