Neural Network Compression for Noisy Storage Devices
- URL: http://arxiv.org/abs/2102.07725v1
- Date: Mon, 15 Feb 2021 18:19:07 GMT
- Title: Neural Network Compression for Noisy Storage Devices
- Authors: Berivan Isik, Kristy Choi, Xin Zheng, Tsachy Weissman, Stefano Ermon,
H.-S. Philip Wong, Armin Alaghi
- Abstract summary: Conventionally, model compression and physical storage are decoupled.
This approach forces the storage to treat each bit of the compressed model equally, and to dedicate the same amount of resources to each bit.
We propose a radically different approach that: (i) employs analog memories to maximize the capacity of each memory cell, and (ii) jointly optimize model compression and physical storage to maximize memory utility.
- Score: 71.4102472611862
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compression and efficient storage of neural network (NN) parameters is
critical for applications that run on resource-constrained devices. Although NN
model compression has made significant progress, there has been considerably
less investigation in the actual physical storage of NN parameters.
Conventionally, model compression and physical storage are decoupled, as
digital storage media with error correcting codes (ECCs) provide robust
error-free storage. This decoupled approach is inefficient, as it forces the
storage to treat each bit of the compressed model equally, and to dedicate the
same amount of resources to each bit. We propose a radically different approach
that: (i) employs analog memories to maximize the capacity of each memory cell,
and (ii) jointly optimizes model compression and physical storage to maximize
memory utility. We investigate the challenges of analog storage by studying
model storage on phase change memory (PCM) arrays and develop a variety of
robust coding strategies for NN model storage. We demonstrate the efficacy of
our approach on MNIST, CIFAR-10 and ImageNet datasets for both existing and
novel compression methods. Compared to conventional error-free digital storage,
our method has the potential to reduce the memory size by one order of
magnitude, without significantly compromising the stored model's accuracy.
Related papers
- BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments [53.71158537264695]
Large language models (LLMs) have revolutionized numerous applications, yet their deployment remains challenged by memory constraints on local devices.
We introduce textbfBitStack, a novel, training-free weight compression approach that enables megabyte-level trade-offs between memory usage and model performance.
arXiv Detail & Related papers (2024-10-31T13:26:11Z) - LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy [59.1298692559785]
Key-Value ( KV) cache is crucial component in serving transformer-based autoregressive large language models (LLMs)
Existing approaches to mitigate this issue include: (1) efficient attention variants integrated in upcycling stages; (2) KV cache compression at test time; and (3) KV cache compression at test time.
We propose a low-rank approximation of KV weight matrices, allowing plug-in integration with existing transformer-based LLMs without model retraining.
Our method is designed to function without model tuning in upcycling stages or task-specific profiling in test stages.
arXiv Detail & Related papers (2024-10-04T03:10:53Z) - A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate
Compression for Split DNN Computing [5.3221129103999125]
Split computing has emerged as a recent paradigm for implementation of DNN-based AI workloads.
We present an approach that addresses the challenge of optimizing the rate-accuracy-complexity trade-off.
Our approach is remarkably lightweight, both during training and inference, highly effective and achieves excellent rate-distortion performance.
arXiv Detail & Related papers (2022-08-24T15:02:11Z) - Memory Replay with Data Compression for Continual Learning [80.95444077825852]
We propose memory replay with data compression to reduce the storage cost of old training samples.
We extensively validate this across several benchmarks of class-incremental learning and in a realistic scenario of object detection for autonomous driving.
arXiv Detail & Related papers (2022-02-14T10:26:23Z) - COMET: A Novel Memory-Efficient Deep Learning Training Framework by
Using Error-Bounded Lossy Compression [8.080129426746288]
Training wide and deep neural networks (DNNs) require large amounts of storage resources such as memory.
We propose a memory-efficient CNN training framework (called COMET) that leverages error-bounded lossy compression.
Our framework can significantly reduce the training memory consumption by up to 13.5X over the baseline training and 1.8X over another state-of-the-art compression-based framework.
arXiv Detail & Related papers (2021-11-18T07:43:45Z) - Nonlinear Tensor Ring Network [39.89070144585793]
State-of-the-art deep neural networks (DNNs) have been widely applied for various real-world applications, and achieved significant performance for cognitive problems.
By converting redundant models into compact ones, compression technique appears to be a practical solution to reducing the storage and memory consumption.
In this paper, we develop a nonlinear tensor ring network (NTRN) in which both fullyconnected and convolutional layers are compressed.
arXiv Detail & Related papers (2021-11-12T02:02:55Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural
Architecture Search [100.71365025972258]
We propose NAS-BERT, an efficient method for BERT compression.
NAS-BERT trains a big supernet on a search space and outputs multiple compressed models with adaptive sizes and latency.
Experiments on GLUE and SQuAD benchmark datasets demonstrate that NAS-BERT can find lightweight models with better accuracy than previous approaches.
arXiv Detail & Related papers (2021-05-30T07:20:27Z) - A Novel Memory-Efficient Deep Learning Training Framework via
Error-Bounded Lossy Compression [6.069852296107781]
We propose a memory-driven high performance DNN training framework that leverages error-bounded lossy compression.
Our framework can significantly reduce the training memory consumption by up to 13.5x and 1.8x over the baseline training and state-of-the-art framework with compression.
arXiv Detail & Related papers (2020-11-18T00:47:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.