Related papers: DTL: Disentangled Transfer Learning for Visual Recognition

DTL: Disentangled Transfer Learning for Visual Recognition

URL: http://arxiv.org/abs/2312.07856v2
Date: Fri, 2 Feb 2024 08:25:16 GMT
Title: DTL: Disentangled Transfer Learning for Visual Recognition
Authors: Minghao Fu, Ke Zhu, Jianxin Wu
Abstract summary: We introduce Disentangled Transfer Learning (DTL), which disentangles the trainable parameters from the backbone using a lightweight Compact Side Network (CSN) The proposed method not only reduces a large amount of GPU memory usage and trainable parameters, but also outperforms existing PETL methods by a significant margin in accuracy.
Score: 21.549234013998255
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When pre-trained models become rapidly larger, the cost of fine-tuning on downstream tasks steadily increases, too. To economically fine-tune these models, parameter-efficient transfer learning (PETL) is proposed, which only tunes a tiny subset of trainable parameters to efficiently learn quality representations. However, current PETL methods are facing the dilemma that during training the GPU memory footprint is not effectively reduced as trainable parameters. PETL will likely fail, too, if the full fine-tuning encounters the out-of-GPU-memory issue. This phenomenon happens because trainable parameters from these methods are generally entangled with the backbone, such that a lot of intermediate states have to be stored in GPU memory for gradient propagation. To alleviate this problem, we introduce Disentangled Transfer Learning (DTL), which disentangles the trainable parameters from the backbone using a lightweight Compact Side Network (CSN). By progressively extracting task-specific information with a few low-rank linear mappings and appropriately adding the information back to the backbone, CSN effectively realizes knowledge transfer in various downstream tasks. We conducted extensive experiments to validate the effectiveness of our method. The proposed method not only reduces a large amount of GPU memory usage and trainable parameters, but also outperforms existing PETL methods by a significant margin in accuracy, achieving new state-of-the-art on several standard benchmarks. The code is available at https://github.com/heekhero/DTL.

Related papers

S2A: A Unified Framework for Parameter and Memory Efficient Transfer Learning [8.602744958104969]
We propose a new PETL framework, called Structure to Activation (S2A), to reduce the memory footprint of activation during fine-tuning.<n>Specifically, our framework consists of: 1) Activation modules design(i.e., bias, prompt and side modules) in the parametric model structure, which results in a significant reduction of adjustable parameters and activation memory.<n>We show that our methods not only outperform existing PETL techniques, achieving a fourfold reduction in GPU memory footprint on average, but also shows competitive performance in accuracy with fewer tunable parameters.
arXiv Detail & Related papers (2025-03-11T08:10:03Z)
FPT+: A Parameter and Memory Efficient Transfer Learning Method for High-resolution Medical Image Classification [1.5791081894226173]
Fine-grained Prompt Tuning plus (FPT+) is a PETL method designed for high-resolution medical image classification. FPT+ performs transfer learning by training a lightweight side network and accessing pre-trained knowledge from a large pre-trained model. Experimental results demonstrate that FPT+ outperforms other PETL methods, using only 1.03% of the learnable parameters and 3.18% of the memory required for fine-tuning an entire ViT-B model.
arXiv Detail & Related papers (2024-08-05T12:33:07Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning [19.17362588650503]
Low-rank Attention Side-Tuning (LAST) trains a side-network composed of only low-rank self-attention modules. We show LAST can be highly parallel across multiple optimization objectives, making it very efficient in downstream task adaptation.
arXiv Detail & Related papers (2024-02-06T14:03:15Z)
Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone. We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z)
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory [69.33445217944029]
PETL is an effective strategy for adapting pre-trained models to downstream domains. Recent PETL works focus on the more valuable memory-efficient characteristic. We propose a new memory-efficient PETL strategy, Universal Parallel Tuning (UniPT)
arXiv Detail & Related papers (2023-08-28T05:38:43Z)
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning [82.93130407930762]
It is costly to update the entire parameter set of large pre-trained models. PETL techniques allow updating a small subset of parameters inside a pre-trained backbone network for a new task. We propose Ladder Side-Tuning (LST), a new PETL technique that reduces training memory requirements by more substantial amounts.
arXiv Detail & Related papers (2022-06-13T23:51:56Z)
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.