BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the
Edge
- URL: http://arxiv.org/abs/2110.15362v1
- Date: Fri, 29 Oct 2021 16:30:57 GMT
- Title: BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the
Edge
- Authors: Abdelrahman Hosny, Marina Neseem, Sherief Reda
- Abstract summary: Training on the Edge enables neural networks to learn continuously from new data after deployment on memory-constrained edge devices.
Existing incremental training methods fine-tune the last few layers sacrificing accuracy gains from re-training the whole model.
In BitTrain, we exploit activation sparsity and propose a novel bitmap compression technique that reduces the memory footprint during training.
- Score: 2.2191297646252646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training on the Edge enables neural networks to learn continuously from new
data after deployment on memory-constrained edge devices. Previous work is
mostly concerned with reducing the number of model parameters which is only
beneficial for inference. However, memory footprint from activations is the
main bottleneck for training on the edge. Existing incremental training methods
fine-tune the last few layers sacrificing accuracy gains from re-training the
whole model. In this work, we investigate the memory footprint of training deep
learning models, and use our observations to propose BitTrain. In BitTrain, we
exploit activation sparsity and propose a novel bitmap compression technique
that reduces the memory footprint during training. We save the activations in
our proposed bitmap compression format during the forward pass of the training,
and restore them during the backward pass for the optimizer computations. The
proposed method can be integrated seamlessly in the computation graph of modern
deep learning frameworks. Our implementation is safe by construction, and has
no negative impact on the accuracy of model training. Experimental results show
up to 34% reduction in the memory footprint at a sparsity level of 50%. Further
pruning during training results in more than 70% sparsity, which can lead to up
to 56% reduction in memory footprint. BitTrain advances the efforts towards
bringing more machine learning capabilities to edge devices. Our source code is
available at https://github.com/scale-lab/BitTrain.
Related papers
- NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks [30.224822087562163]
NeuZip is a new weight compression scheme based on the entropy of floating-point numbers in neural networks.
We significantly reduce the memory footprint of training a Llama-3 8B model from 31GB to less than 16GB.
In inference, our method can reduce memory usage by more than half while maintaining near-lossless performance.
arXiv Detail & Related papers (2024-10-28T01:12:20Z) - Block Selective Reprogramming for On-device Training of Vision Transformers [12.118303034660531]
We present block selective reprogramming (BSR) in which we fine-tune only a fraction of total blocks of a pre-trained model.
Compared to the existing alternatives, our approach simultaneously reduces training memory by up to 1.4x and compute cost by up to 2x.
arXiv Detail & Related papers (2024-03-25T08:41:01Z) - Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone.
We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z) - DIVISION: Memory Efficient Training via Dual Activation Precision [60.153754740511864]
State-of-the-art work combines a search of quantization bit-width with the training, which makes the procedure complicated and less transparent.
We propose a simple and effective method to compress DNN training.
Experiment results show DIVISION has better comprehensive performance than state-of-the-art methods, including over 10x compression of activation maps and competitive training throughput, without loss of model accuracy.
arXiv Detail & Related papers (2022-08-05T03:15:28Z) - On-Device Training Under 256KB Memory [62.95579393237751]
We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory.
Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB and 1MB Flash.
arXiv Detail & Related papers (2022-06-30T17:59:08Z) - Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep
Neural Network, a Survey [69.3939291118954]
State-of-the-art deep learning models have a parameter count that reaches into the billions. Training, storing and transferring such models is energy and time consuming, thus costly.
Model compression lowers storage and transfer costs, and can further make training more efficient by decreasing the number of computations in the forward and/or backward pass.
This work is a survey on methods which reduce the number of trained weights in deep learning models throughout the training.
arXiv Detail & Related papers (2022-05-17T05:37:08Z) - Mesa: A Memory-saving Training Framework for Transformers [58.78933015299703]
We present Mesa, a memory-saving training framework for Transformers.
Mesa uses exact activations during forward pass while storing a low-precision version of activations to reduce memory consumption during training.
Experiments on ImageNet, CIFAR-100 and ADE20K demonstrate that Mesa can reduce half of the memory footprints during training.
arXiv Detail & Related papers (2021-11-22T11:23:01Z) - MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the
Edge [72.16021611888165]
This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices.
The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S)
Our results suggest that unforgettable examples can be identified in-situ even during the dynamic exploration of sparsity masks.
arXiv Detail & Related papers (2021-10-26T21:15:17Z) - Improving compute efficacy frontiers with SliceOut [31.864949424541344]
We introduce SliceOut -- a dropout-inspired scheme to train deep learning models faster without impacting final test accuracy.
At test time, turning off SliceOut performs an implicit ensembling across a linear number of architectures that preserves test accuracy.
This leads to faster processing of large computational workloads overall, and significantly reduce the resulting energy consumption and CO2emissions.
arXiv Detail & Related papers (2020-07-21T15:59:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.