COMET: A Novel Memory-Efficient Deep Learning Training Framework by
Using Error-Bounded Lossy Compression
- URL: http://arxiv.org/abs/2111.09562v1
- Date: Thu, 18 Nov 2021 07:43:45 GMT
- Title: COMET: A Novel Memory-Efficient Deep Learning Training Framework by
Using Error-Bounded Lossy Compression
- Authors: Sian Jin, Chengming Zhang, Xintong Jiang, Yunhe Feng, Hui Guan,
Guanpeng Li, Shuaiwen Leon Song, Dingwen Tao
- Abstract summary: Training wide and deep neural networks (DNNs) require large amounts of storage resources such as memory.
We propose a memory-efficient CNN training framework (called COMET) that leverages error-bounded lossy compression.
Our framework can significantly reduce the training memory consumption by up to 13.5X over the baseline training and 1.8X over another state-of-the-art compression-based framework.
- Score: 8.080129426746288
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training wide and deep neural networks (DNNs) require large amounts of
storage resources such as memory because the intermediate activation data must
be saved in the memory during forward propagation and then restored for
backward propagation. However, state-of-the-art accelerators such as GPUs are
only equipped with very limited memory capacities due to hardware design
constraints, which significantly limits the maximum batch size and hence
performance speedup when training large-scale DNNs. Traditional memory saving
techniques either suffer from performance overhead or are constrained by
limited interconnect bandwidth or specific interconnect technology. In this
paper, we propose a novel memory-efficient CNN training framework (called
COMET) that leverages error-bounded lossy compression to significantly reduce
the memory requirement for training, to allow training larger models or to
accelerate training. Different from the state-of-the-art solutions that adopt
image-based lossy compressors (such as JPEG) to compress the activation data,
our framework purposely adopts error-bounded lossy compression with a strict
error-controlling mechanism. Specifically, we perform a theoretical analysis on
the compression error propagation from the altered activation data to the
gradients, and empirically investigate the impact of altered gradients over the
training process. Based on these analyses, we optimize the error-bounded lossy
compression and propose an adaptive error-bound control scheme for activation
data compression. We evaluate our design against state-of-the-art solutions
with five widely-adopted CNNs and ImageNet dataset. Experiments demonstrate
that our proposed framework can significantly reduce the training memory
consumption by up to 13.5X over the baseline training and 1.8X over another
state-of-the-art compression-based framework, respectively, with little or no
accuracy loss.
Related papers
- Causal Context Adjustment Loss for Learned Image Compression [72.7300229848778]
In recent years, learned image compression (LIC) technologies have surpassed conventional methods notably in terms of rate-distortion (RD) performance.
Most present techniques are VAE-based with an autoregressive entropy model, which obviously promotes the RD performance by utilizing the decoded causal context.
In this paper, we make the first attempt in investigating the way to explicitly adjust the causal context with our proposed Causal Context Adjustment loss.
arXiv Detail & Related papers (2024-10-07T09:08:32Z) - Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression [10.233937665979694]
DLRM is a state-of-the-art recommendation system model that has gained widespread adoption across various industry applications.
A significant bottleneck in this process is the time-consuming all-to-all communication required to collect embedding data from all devices.
We introduce a method that employs error-bounded lossy compression to reduce the communication data size and accelerate DLRM training.
arXiv Detail & Related papers (2024-07-05T05:55:18Z) - Probing Image Compression For Class-Incremental Learning [8.711266563753846]
Continual machine learning (ML) systems rely on storing representative samples, also known as exemplars, within a limited memory constraint to maintain the performance on previously learned data.
In this paper, we explore the use of image compression as a strategy to enhance the buffer's capacity, thereby increasing exemplar diversity.
We introduce a new framework to incorporate image compression for continual ML including a pre-processing data compression step and an efficient compression rate/algorithm selection method.
arXiv Detail & Related papers (2024-03-10T18:58:14Z) - GraVAC: Adaptive Compression for Communication-Efficient Distributed DL
Training [0.0]
Distributed data-parallel (DDP) training improves overall application throughput as multiple devices train on a subset of data and aggregate updates to produce a globally shared model.
GraVAC is a framework to dynamically adjust compression factor throughout training by evaluating model progress and assessing information loss associated with compression.
As opposed to using a static compression factor, GraVAC reduces end-to-end training time for ResNet101, VGG16 and LSTM by 4.32x, 1.95x and 6.67x respectively.
arXiv Detail & Related papers (2023-05-20T14:25:17Z) - Crowd Counting on Heavily Compressed Images with Curriculum Pre-Training [90.76576712433595]
Applying lossy compression on images processed by deep neural networks can lead to significant accuracy degradation.
Inspired by the curriculum learning paradigm, we present a novel training approach called curriculum pre-training (CPT) for crowd counting on compressed images.
arXiv Detail & Related papers (2022-08-15T08:43:21Z) - DIVISION: Memory Efficient Training via Dual Activation Precision [60.153754740511864]
State-of-the-art work combines a search of quantization bit-width with the training, which makes the procedure complicated and less transparent.
We propose a simple and effective method to compress DNN training.
Experiment results show DIVISION has better comprehensive performance than state-of-the-art methods, including over 10x compression of activation maps and competitive training throughput, without loss of model accuracy.
arXiv Detail & Related papers (2022-08-05T03:15:28Z) - On-Device Training Under 256KB Memory [62.95579393237751]
We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory.
Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB and 1MB Flash.
arXiv Detail & Related papers (2022-06-30T17:59:08Z) - Practical Network Acceleration with Tiny Sets [38.742142493108744]
Network compression is effective in accelerating the inference of deep neural networks.
But it often requires finetuning with all the training data to recover from the accuracy loss.
We propose a method named PRACTISE to accelerate the network with tiny sets of training images.
arXiv Detail & Related papers (2022-02-16T05:04:38Z) - Neural Network Compression for Noisy Storage Devices [71.4102472611862]
Conventionally, model compression and physical storage are decoupled.
This approach forces the storage to treat each bit of the compressed model equally, and to dedicate the same amount of resources to each bit.
We propose a radically different approach that: (i) employs analog memories to maximize the capacity of each memory cell, and (ii) jointly optimize model compression and physical storage to maximize memory utility.
arXiv Detail & Related papers (2021-02-15T18:19:07Z) - An Efficient Statistical-based Gradient Compression Technique for
Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) is a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC)
Our evaluation shows SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.
arXiv Detail & Related papers (2021-01-26T13:06:00Z) - A Novel Memory-Efficient Deep Learning Training Framework via
Error-Bounded Lossy Compression [6.069852296107781]
We propose a memory-driven high performance DNN training framework that leverages error-bounded lossy compression.
Our framework can significantly reduce the training memory consumption by up to 13.5x and 1.8x over the baseline training and state-of-the-art framework with compression.
arXiv Detail & Related papers (2020-11-18T00:47:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.