Bandwidth-efficient Inference for Neural Image Compression
- URL: http://arxiv.org/abs/2309.02855v2
- Date: Thu, 7 Sep 2023 03:25:02 GMT
- Title: Bandwidth-efficient Inference for Neural Image Compression
- Authors: Shanzhi Yin, Tongda Xu, Yongsheng Liang, Yuanyuan Wang, Yanghao Li,
Yan Wang, Jingjing Liu
- Abstract summary: We propose an end-to-end differentiable bandwidth efficient neural inference method with the activation compressed by neural data compression method.
Optimized with existing model quantization methods, low-level task of image compression can achieve up to 19x bandwidth reduction with 6.21x energy saving.
- Score: 26.87198174202502
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: With neural networks growing deeper and feature maps growing larger, limited
communication bandwidth with external memory (or DRAM) and power constraints
become a bottleneck in implementing network inference on mobile and edge
devices. In this paper, we propose an end-to-end differentiable bandwidth
efficient neural inference method with the activation compressed by neural data
compression method. Specifically, we propose a transform-quantization-entropy
coding pipeline for activation compression with symmetric exponential Golomb
coding and a data-dependent Gaussian entropy model for arithmetic coding.
Optimized with existing model quantization methods, low-level task of image
compression can achieve up to 19x bandwidth reduction with 6.21x energy saving.
Related papers
- "Lossless" Compression of Deep Neural Networks: A High-dimensional
Neural Tangent Kernel Approach [49.744093838327615]
We provide a novel compression approach to wide and fully-connected emphdeep neural nets.
Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme.
arXiv Detail & Related papers (2024-03-01T03:46:28Z) - Frequency Disentangled Features in Neural Image Compression [13.016298207860974]
A neural image compression network is governed by how well the entropy model matches the true distribution of the latent code.
In this paper, we propose a feature-level frequency disentanglement to help the relaxed scalar quantization achieve lower bit rates.
The proposed network not only outperforms hand-engineered codecs, but also neural network-based codecs built on-heavy spatially autoregressive entropy models.
arXiv Detail & Related papers (2023-08-04T14:55:44Z) - Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks.
We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation.
We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z) - Modeling Image Quantization Tradeoffs for Optimal Compression [0.0]
Lossy compression algorithms target tradeoffs by quantizating high frequency data to increase compression rates.
We propose a new method of optimizing quantization tables using Deep Learning and a minimax loss function.
arXiv Detail & Related papers (2021-12-14T07:35:22Z) - Implicit Neural Representations for Image Compression [103.78615661013623]
Implicit Neural Representations (INRs) have gained attention as a novel and effective representation for various data types.
We propose the first comprehensive compression pipeline based on INRs including quantization, quantization-aware retraining and entropy coding.
We find that our approach to source compression with INRs vastly outperforms similar prior work.
arXiv Detail & Related papers (2021-12-08T13:02:53Z) - Nonlinear Tensor Ring Network [39.89070144585793]
State-of-the-art deep neural networks (DNNs) have been widely applied for various real-world applications, and achieved significant performance for cognitive problems.
By converting redundant models into compact ones, compression technique appears to be a practical solution to reducing the storage and memory consumption.
In this paper, we develop a nonlinear tensor ring network (NTRN) in which both fullyconnected and convolutional layers are compressed.
arXiv Detail & Related papers (2021-11-12T02:02:55Z) - Compression-aware Projection with Greedy Dimension Reduction for
Convolutional Neural Network Activations [3.6188659868203388]
We propose a compression-aware projection system to improve the trade-off between classification accuracy and compression ratio.
Our test results show that the proposed methods effectively reduce 2.91x5.97x memory access with negligible accuracy drop on MobileNetV2/ResNet18/VGG16.
arXiv Detail & Related papers (2021-10-17T14:02:02Z) - Compressing Neural Networks: Towards Determining the Optimal Layer-wise
Decomposition [62.41259783906452]
We present a novel global compression framework for deep neural networks.
It automatically analyzes each layer to identify the optimal per-layer compression ratio.
Our results open up new avenues for future research into the global performance-size trade-offs of modern neural networks.
arXiv Detail & Related papers (2021-07-23T20:01:30Z) - On Effects of Compression with Hyperdimensional Computing in Distributed
Randomized Neural Networks [6.25118865553438]
We propose a model for distributed classification based on randomized neural networks and hyperdimensional computing.
In this work, we propose a more flexible approach to compression and compare it to conventional compression algorithms, dimensionality reduction, and quantization techniques.
arXiv Detail & Related papers (2021-06-17T22:02:40Z) - Substitutional Neural Image Compression [48.20906717052056]
Substitutional Neural Image Compression (SNIC) is a general approach for enhancing any neural image compression model.
It boosts compression performance toward a flexible distortion metric and enables bit-rate control using a single model instance.
arXiv Detail & Related papers (2021-05-16T20:53:31Z) - PowerGossip: Practical Low-Rank Communication Compression in
Decentralized Deep Learning [62.440827696638664]
We introduce a simple algorithm that directly compresses the model differences between neighboring workers.
Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
arXiv Detail & Related papers (2020-08-04T09:14:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.