Task-oriented Memory-efficient Pruning-Adapter
- URL: http://arxiv.org/abs/2303.14704v2
- Date: Thu, 6 Apr 2023 03:44:38 GMT
- Title: Task-oriented Memory-efficient Pruning-Adapter
- Authors: Guorun Wang, Jun Yang, Yaoru Sun
- Abstract summary: We propose a task-oriented Pruning-Adapter method that achieve a high memory efficiency of training and memory.
No significant decrease in accuracy in GLUE tasks, achieving training and inference efficiency at the same time.
- Score: 3.0751447761822903
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Outstanding performance and growing size of Large Language Models has led
to increased attention in parameter efficient learning. The two predominant
approaches are Adapters and Pruning. Adapters are to freeze the model and give
it a new weight matrix on the side, which can significantly reduce the time and
memory of training, but the cost is that the evaluation and testing will
increase the time and memory consumption. Pruning is to cut off some weight and
re-distribute the remaining weight, which sacrifices the complexity of training
at the cost of extremely high memory and training time, making the cost of
evaluation and testing relatively low. So efficiency of training and inference
can't be obtained in the same time. In this work, we propose a task-oriented
Pruning-Adapter method that achieve a high memory efficiency of training and
memory, and speeds up training time and ensures no significant decrease in
accuracy in GLUE tasks, achieving training and inference efficiency at the same
time.
Related papers
- PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation [61.57833648734164]
We propose a novel Parallel Yielding Re-Activation (PYRA) method for training-inference efficient task adaptation.
PYRA outperforms all competing methods under both low compression rate and high compression rate.
arXiv Detail & Related papers (2024-03-14T09:06:49Z) - Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone.
We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z) - CAME: Confidence-guided Adaptive Memory Efficient Optimization [20.009302737137787]
Adaptive gradient methods have demonstrated excellent performance in the training of large language models.
The need for maintaining second-moment estimates requires maintaining a high cost of extra memory overheads.
Several memory-efficients have been proposed to obtain a drastic reduction in auxiliary memory usage, but with a performance penalty.
We propose CAME to simultaneously achieve two goals: fast convergence as in traditional adaptive methods, and low memory usage as in memory-efficient methods.
arXiv Detail & Related papers (2023-07-05T06:05:36Z) - Parameter-efficient is not sufficient: Exploring Parameter, Memory, and
Time Efficient Adapter Tuning for Dense Predictions [9.068569788978854]
parameter-efficient transfer learning (PETL) methods have shown promising performance in adapting to downstream tasks with only a few trainable parameters.
PETL methods in computer vision (CV) can be computationally expensive and require large amounts of memory and time cost during training.
mathrmE3VA$ can save up to 62.2% training memory and 26.2% training time on average.
arXiv Detail & Related papers (2023-06-16T09:54:07Z) - Towards Memory- and Time-Efficient Backpropagation for Training Spiking
Neural Networks [70.75043144299168]
Spiking Neural Networks (SNNs) are promising energy-efficient models for neuromorphic computing.
We propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency.
Our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
arXiv Detail & Related papers (2023-02-28T05:01:01Z) - Online Convolutional Re-parameterization [51.97831675242173]
We present online convolutional re- parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution.
Compared with the state-of-the-art re-param models, OREPA is able to save the training-time memory cost by about 70% and accelerate the training speed by around 2x.
We also conduct experiments on object detection and semantic segmentation and show consistent improvements on the downstream tasks.
arXiv Detail & Related papers (2022-04-02T09:50:19Z) - Mesa: A Memory-saving Training Framework for Transformers [58.78933015299703]
We present Mesa, a memory-saving training framework for Transformers.
Mesa uses exact activations during forward pass while storing a low-precision version of activations to reduce memory consumption during training.
Experiments on ImageNet, CIFAR-100 and ADE20K demonstrate that Mesa can reduce half of the memory footprints during training.
arXiv Detail & Related papers (2021-11-22T11:23:01Z) - MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the
Edge [72.16021611888165]
This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices.
The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S)
Our results suggest that unforgettable examples can be identified in-situ even during the dynamic exploration of sparsity masks.
arXiv Detail & Related papers (2021-10-26T21:15:17Z) - Improving compute efficacy frontiers with SliceOut [31.864949424541344]
We introduce SliceOut -- a dropout-inspired scheme to train deep learning models faster without impacting final test accuracy.
At test time, turning off SliceOut performs an implicit ensembling across a linear number of architectures that preserves test accuracy.
This leads to faster processing of large computational workloads overall, and significantly reduce the resulting energy consumption and CO2emissions.
arXiv Detail & Related papers (2020-07-21T15:59:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.