Related papers: Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence

Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence

URL: http://arxiv.org/abs/2105.07102v1
Date: Sat, 15 May 2021 00:10:12 GMT
Title: Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence
Authors: Robert A. Cohen, Hyomin Choi, Ivan V. Baji\'c
Abstract summary: In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device. This paper presents a novel lightweight compression technique designed specifically to quantize and compress the features output by the intermediate layer of a split DNN.
Score: 32.03465747357384
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device, and the remaining portion of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a novel lightweight compression technique designed specifically to quantize and compress the features output by the intermediate layer of a split DNN, without requiring any retraining of the network weights. Mathematical models for estimating the clipping and quantization error of ReLU and leaky-ReLU activations at this intermediate layer are developed and used to compute optimal clipping ranges for coarse quantization. We also present a modified entropy-constrained design algorithm for quantizing clipped activations. When applied to popular object-detection and classification DNNs, we were able to compress the 32-bit floating point intermediate activations down to 0.6 to 0.8 bits, while keeping the loss in accuracy to less than 1%. When compared to HEVC, we found that the lightweight codec consistently provided better inference accuracy, by up to 1.3%. The performance and simplicity of this lightweight compression technique makes it an attractive option for coding an intermediate layer of a split neural network for edge/cloud applications.

Related papers

Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding [56.066799081747845]
The ever-growing size of neural networks poses serious challenges on resource-constrained devices.<n>We propose a novel post-training compression framework that combines rate-aware quantization with entropy coding.<n>Our method allows for very fast decoding and is compatible with arbitrary quantization grids.
arXiv Detail & Related papers (2025-05-24T15:52:49Z)
Efficient Implicit Neural Compression of Point Clouds via Learnable Activation in Latent Space [10.056460330355193]
Implicit Neural Representations (INRs) have emerged as a powerful paradigm in deep learning. We propose textbfPICO, an INR-based framework for static point cloud compression. Our approach exhibits highly competitive results, with an average PCQM gain of $2.7 times 10-3$.
arXiv Detail & Related papers (2025-04-20T03:37:32Z)
Attention-based Feature Compression for CNN Inference Offloading in Edge Computing [93.67044879636093]
This paper studies the computational offloading of CNN inference in device-edge co-inference systems. We propose a novel autoencoder-based CNN architecture (AECNN) for effective feature extraction at end-device. Experiments show that AECNN can compress the intermediate data by more than 256x with only about 4% accuracy loss.
arXiv Detail & Related papers (2022-11-24T18:10:01Z)
A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate Compression for Split DNN Computing [5.3221129103999125]
Split computing has emerged as a recent paradigm for implementation of DNN-based AI workloads. We present an approach that addresses the challenge of optimizing the rate-accuracy-complexity trade-off. Our approach is remarkably lightweight, both during training and inference, highly effective and achieves excellent rate-distortion performance.
arXiv Detail & Related papers (2022-08-24T15:02:11Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Nonlinear Tensor Ring Network [39.89070144585793]
State-of-the-art deep neural networks (DNNs) have been widely applied for various real-world applications, and achieved significant performance for cognitive problems. By converting redundant models into compact ones, compression technique appears to be a practical solution to reducing the storage and memory consumption. In this paper, we develop a nonlinear tensor ring network (NTRN) in which both fullyconnected and convolutional layers are compressed.
arXiv Detail & Related papers (2021-11-12T02:02:55Z)
Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs. SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space. Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z)
Compact representations of convolutional neural networks via weight pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization. We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
DeepCompress: Efficient Point Cloud Geometry Compression [1.808877001896346]
We propose a more efficient deep learning-based encoder architecture for point clouds compression. We show that incorporating the learned activation function from Efficient Neural Image Compression (CENIC) yields dramatic gains in efficiency and performance. Our proposed modifications outperform the baseline approaches by a small margin in terms of Bjontegard delta rate and PSNR values.
arXiv Detail & Related papers (2021-06-02T23:18:11Z)
Lightweight compression of neural network feature tensors for collaborative intelligence [32.03465747357384]
In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a relatively low-complexity device such as a mobile phone or edge device. This paper presents a novel lightweight compression technique designed specifically to code the activations of a split DNN layer.
arXiv Detail & Related papers (2021-05-12T23:41:35Z)
Kernel Quantization for Efficient Network Compression [59.55192551370948]
Kernel Quantization (KQ) aims to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version without significant performance loss. Inspired by the evolution from weight pruning to filter pruning, we propose to quantize in both kernel and weight level. Experiments on the ImageNet classification task prove that KQ needs 1.05 and 1.62 bits on average in VGG and ResNet18, respectively, to represent each parameter in the convolution layer.
arXiv Detail & Related papers (2020-03-11T08:00:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.