A Novel Memory-Efficient Deep Learning Training Framework via
Error-Bounded Lossy Compression
- URL: http://arxiv.org/abs/2011.09017v1
- Date: Wed, 18 Nov 2020 00:47:21 GMT
- Title: A Novel Memory-Efficient Deep Learning Training Framework via
Error-Bounded Lossy Compression
- Authors: Sian Jin, Guanpeng Li, Shuaiwen Leon Song, Dingwen Tao
- Abstract summary: We propose a memory-driven high performance DNN training framework that leverages error-bounded lossy compression.
Our framework can significantly reduce the training memory consumption by up to 13.5x and 1.8x over the baseline training and state-of-the-art framework with compression.
- Score: 6.069852296107781
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks (DNNs) are becoming increasingly deeper, wider, and
non-linear due to the growing demands on prediction accuracy and analysis
quality. When training a DNN model, the intermediate activation data must be
saved in the memory during forward propagation and then restored for backward
propagation. However, state-of-the-art accelerators such as GPUs are only
equipped with very limited memory capacities due to hardware design
constraints, which significantly limits the maximum batch size and hence
performance speedup when training large-scale DNNs.
In this paper, we propose a novel memory-driven high performance DNN training
framework that leverages error-bounded lossy compression to significantly
reduce the memory requirement for training in order to allow training larger
networks. Different from the state-of-the-art solutions that adopt image-based
lossy compressors such as JPEG to compress the activation data, our framework
purposely designs error-bounded lossy compression with a strict
error-controlling mechanism. Specifically, we provide theoretical analysis on
the compression error propagation from the altered activation data to the
gradients, and then empirically investigate the impact of altered gradients
over the entire training process. Based on these analyses, we then propose an
improved lossy compressor and an adaptive scheme to dynamically configure the
lossy compression error-bound and adjust the training batch size to further
utilize the saved memory space for additional speedup. We evaluate our design
against state-of-the-art solutions with four popular DNNs and the ImageNet
dataset. Results demonstrate that our proposed framework can significantly
reduce the training memory consumption by up to 13.5x and 1.8x over the
baseline training and state-of-the-art framework with compression,
respectively, with little or no accuracy loss.
Related papers
- Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression [10.233937665979694]
DLRM is a state-of-the-art recommendation system model that has gained widespread adoption across various industry applications.
A significant bottleneck in this process is the time-consuming all-to-all communication required to collect embedding data from all devices.
We introduce a method that employs error-bounded lossy compression to reduce the communication data size and accelerate DLRM training.
arXiv Detail & Related papers (2024-07-05T05:55:18Z) - A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate
Compression for Split DNN Computing [5.3221129103999125]
Split computing has emerged as a recent paradigm for implementation of DNN-based AI workloads.
We present an approach that addresses the challenge of optimizing the rate-accuracy-complexity trade-off.
Our approach is remarkably lightweight, both during training and inference, highly effective and achieves excellent rate-distortion performance.
arXiv Detail & Related papers (2022-08-24T15:02:11Z) - Crowd Counting on Heavily Compressed Images with Curriculum Pre-Training [90.76576712433595]
Applying lossy compression on images processed by deep neural networks can lead to significant accuracy degradation.
Inspired by the curriculum learning paradigm, we present a novel training approach called curriculum pre-training (CPT) for crowd counting on compressed images.
arXiv Detail & Related papers (2022-08-15T08:43:21Z) - DIVISION: Memory Efficient Training via Dual Activation Precision [60.153754740511864]
State-of-the-art work combines a search of quantization bit-width with the training, which makes the procedure complicated and less transparent.
We propose a simple and effective method to compress DNN training.
Experiment results show DIVISION has better comprehensive performance than state-of-the-art methods, including over 10x compression of activation maps and competitive training throughput, without loss of model accuracy.
arXiv Detail & Related papers (2022-08-05T03:15:28Z) - Modeling Image Quantization Tradeoffs for Optimal Compression [0.0]
Lossy compression algorithms target tradeoffs by quantizating high frequency data to increase compression rates.
We propose a new method of optimizing quantization tables using Deep Learning and a minimax loss function.
arXiv Detail & Related papers (2021-12-14T07:35:22Z) - Implicit Neural Representations for Image Compression [103.78615661013623]
Implicit Neural Representations (INRs) have gained attention as a novel and effective representation for various data types.
We propose the first comprehensive compression pipeline based on INRs including quantization, quantization-aware retraining and entropy coding.
We find that our approach to source compression with INRs vastly outperforms similar prior work.
arXiv Detail & Related papers (2021-12-08T13:02:53Z) - COMET: A Novel Memory-Efficient Deep Learning Training Framework by
Using Error-Bounded Lossy Compression [8.080129426746288]
Training wide and deep neural networks (DNNs) require large amounts of storage resources such as memory.
We propose a memory-efficient CNN training framework (called COMET) that leverages error-bounded lossy compression.
Our framework can significantly reduce the training memory consumption by up to 13.5X over the baseline training and 1.8X over another state-of-the-art compression-based framework.
arXiv Detail & Related papers (2021-11-18T07:43:45Z) - Improving Computational Efficiency in Visual Reinforcement Learning via
Stored Embeddings [89.63764845984076]
We present Stored Embeddings for Efficient Reinforcement Learning (SEER)
SEER is a simple modification of existing off-policy deep reinforcement learning methods.
We show that SEER does not degrade the performance of RLizable agents while significantly saving computation and memory.
arXiv Detail & Related papers (2021-03-04T08:14:10Z) - Neural Network Compression for Noisy Storage Devices [71.4102472611862]
Conventionally, model compression and physical storage are decoupled.
This approach forces the storage to treat each bit of the compressed model equally, and to dedicate the same amount of resources to each bit.
We propose a radically different approach that: (i) employs analog memories to maximize the capacity of each memory cell, and (ii) jointly optimize model compression and physical storage to maximize memory utility.
arXiv Detail & Related papers (2021-02-15T18:19:07Z) - An Efficient Statistical-based Gradient Compression Technique for
Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) is a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC)
Our evaluation shows SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.
arXiv Detail & Related papers (2021-01-26T13:06:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.