Lightweight Compression of Intermediate Neural Network Features for
Collaborative Intelligence
- URL: http://arxiv.org/abs/2105.07102v1
- Date: Sat, 15 May 2021 00:10:12 GMT
- Title: Lightweight Compression of Intermediate Neural Network Features for
Collaborative Intelligence
- Authors: Robert A. Cohen, Hyomin Choi, Ivan V. Baji\'c
- Abstract summary: In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device.
This paper presents a novel lightweight compression technique designed specifically to quantize and compress the features output by the intermediate layer of a split DNN.
- Score: 32.03465747357384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In collaborative intelligence applications, part of a deep neural network
(DNN) is deployed on a lightweight device such as a mobile phone or edge
device, and the remaining portion of the DNN is processed where more computing
resources are available, such as in the cloud. This paper presents a novel
lightweight compression technique designed specifically to quantize and
compress the features output by the intermediate layer of a split DNN, without
requiring any retraining of the network weights. Mathematical models for
estimating the clipping and quantization error of ReLU and leaky-ReLU
activations at this intermediate layer are developed and used to compute
optimal clipping ranges for coarse quantization. We also present a modified
entropy-constrained design algorithm for quantizing clipped activations. When
applied to popular object-detection and classification DNNs, we were able to
compress the 32-bit floating point intermediate activations down to 0.6 to 0.8
bits, while keeping the loss in accuracy to less than 1%. When compared to
HEVC, we found that the lightweight codec consistently provided better
inference accuracy, by up to 1.3%. The performance and simplicity of this
lightweight compression technique makes it an attractive option for coding an
intermediate layer of a split neural network for edge/cloud applications.
Related papers
- Attention-based Feature Compression for CNN Inference Offloading in Edge
Computing [93.67044879636093]
This paper studies the computational offloading of CNN inference in device-edge co-inference systems.
We propose a novel autoencoder-based CNN architecture (AECNN) for effective feature extraction at end-device.
Experiments show that AECNN can compress the intermediate data by more than 256x with only about 4% accuracy loss.
arXiv Detail & Related papers (2022-11-24T18:10:01Z) - A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate
Compression for Split DNN Computing [5.3221129103999125]
Split computing has emerged as a recent paradigm for implementation of DNN-based AI workloads.
We present an approach that addresses the challenge of optimizing the rate-accuracy-complexity trade-off.
Our approach is remarkably lightweight, both during training and inference, highly effective and achieves excellent rate-distortion performance.
arXiv Detail & Related papers (2022-08-24T15:02:11Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Nonlinear Tensor Ring Network [39.89070144585793]
State-of-the-art deep neural networks (DNNs) have been widely applied for various real-world applications, and achieved significant performance for cognitive problems.
By converting redundant models into compact ones, compression technique appears to be a practical solution to reducing the storage and memory consumption.
In this paper, we develop a nonlinear tensor ring network (NTRN) in which both fullyconnected and convolutional layers are compressed.
arXiv Detail & Related papers (2021-11-12T02:02:55Z) - Sub-bit Neural Networks: Learning to Compress and Accelerate Binary
Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs.
SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space.
Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - DeepCompress: Efficient Point Cloud Geometry Compression [1.808877001896346]
We propose a more efficient deep learning-based encoder architecture for point clouds compression.
We show that incorporating the learned activation function from Efficient Neural Image Compression (CENIC) yields dramatic gains in efficiency and performance.
Our proposed modifications outperform the baseline approaches by a small margin in terms of Bjontegard delta rate and PSNR values.
arXiv Detail & Related papers (2021-06-02T23:18:11Z) - Lightweight compression of neural network feature tensors for
collaborative intelligence [32.03465747357384]
In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a relatively low-complexity device such as a mobile phone or edge device.
This paper presents a novel lightweight compression technique designed specifically to code the activations of a split DNN layer.
arXiv Detail & Related papers (2021-05-12T23:41:35Z) - Kernel Quantization for Efficient Network Compression [59.55192551370948]
Kernel Quantization (KQ) aims to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version without significant performance loss.
Inspired by the evolution from weight pruning to filter pruning, we propose to quantize in both kernel and weight level.
Experiments on the ImageNet classification task prove that KQ needs 1.05 and 1.62 bits on average in VGG and ResNet18, respectively, to represent each parameter in the convolution layer.
arXiv Detail & Related papers (2020-03-11T08:00:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.