Compression-aware Projection with Greedy Dimension Reduction for
Convolutional Neural Network Activations
- URL: http://arxiv.org/abs/2110.08828v1
- Date: Sun, 17 Oct 2021 14:02:02 GMT
- Title: Compression-aware Projection with Greedy Dimension Reduction for
Convolutional Neural Network Activations
- Authors: Yu-Shan Tai, Chieh-Fang Teng, Cheng-Yang Chang, and An-Yeu Wu
- Abstract summary: We propose a compression-aware projection system to improve the trade-off between classification accuracy and compression ratio.
Our test results show that the proposed methods effectively reduce 2.91x5.97x memory access with negligible accuracy drop on MobileNetV2/ResNet18/VGG16.
- Score: 3.6188659868203388
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Convolutional neural networks (CNNs) achieve remarkable performance in a wide
range of fields. However, intensive memory access of activations introduces
considerable energy consumption, impeding deployment of CNNs on
resourceconstrained edge devices. Existing works in activation compression
propose to transform feature maps for higher compressibility, thus enabling
dimension reduction. Nevertheless, in the case of aggressive dimension
reduction, these methods lead to severe accuracy drop. To improve the trade-off
between classification accuracy and compression ratio, we propose a
compression-aware projection system, which employs a learnable projection to
compensate for the reconstruction loss. In addition, a greedy selection metric
is introduced to optimize the layer-wise compression ratio allocation by
considering both accuracy and #bits reduction simultaneously. Our test results
show that the proposed methods effectively reduce 2.91x~5.97x memory access
with negligible accuracy drop on MobileNetV2/ResNet18/VGG16.
Related papers
- SPC-NeRF: Spatial Predictive Compression for Voxel Based Radiance Field [41.33347056627581]
We propose SPC-NeRF, a novel framework applying spatial predictive coding in EVG compression.
Our method can achieve 32% bit saving compared to the state-of-the-art method VQRF.
arXiv Detail & Related papers (2024-02-26T07:40:45Z) - Communication-Efficient Distributed Learning with Local Immediate Error
Compensation [95.6828475028581]
We propose the Local Immediate Error Compensated SGD (LIEC-SGD) optimization algorithm.
LIEC-SGD is superior to previous works in either the convergence rate or the communication cost.
arXiv Detail & Related papers (2024-02-19T05:59:09Z) - Bandwidth-efficient Inference for Neural Image Compression [26.87198174202502]
We propose an end-to-end differentiable bandwidth efficient neural inference method with the activation compressed by neural data compression method.
Optimized with existing model quantization methods, low-level task of image compression can achieve up to 19x bandwidth reduction with 6.21x energy saving.
arXiv Detail & Related papers (2023-09-06T09:31:37Z) - DIVISION: Memory Efficient Training via Dual Activation Precision [60.153754740511864]
State-of-the-art work combines a search of quantization bit-width with the training, which makes the procedure complicated and less transparent.
We propose a simple and effective method to compress DNN training.
Experiment results show DIVISION has better comprehensive performance than state-of-the-art methods, including over 10x compression of activation maps and competitive training throughput, without loss of model accuracy.
arXiv Detail & Related papers (2022-08-05T03:15:28Z) - Learnable Mixed-precision and Dimension Reduction Co-design for
Low-storage Activation [9.838135675969026]
Deep convolutional neural networks (CNNs) have achieved many eye-catching results.
deploying CNNs on resource-constrained edge devices is constrained by limited memory bandwidth for transmitting large intermediated data during inference.
We propose a learnable mixed-precision and dimension reduction co-design system, which separates channels into groups and allocates compression policies according to their importance.
arXiv Detail & Related papers (2022-07-16T12:53:52Z) - Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks.
We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation.
We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z) - An Efficient Statistical-based Gradient Compression Technique for
Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) is a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC)
Our evaluation shows SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.
arXiv Detail & Related papers (2021-01-26T13:06:00Z) - Tensor Reordering for CNN Compression [7.228285747845778]
We show how parameter redundancy in Convolutional Neural Network (CNN) filters can be effectively reduced by pruning in spectral domain.
Our approach is applied to pretrained CNNs and we show that minor additional fine-tuning allows our method to recover the original model performance.
arXiv Detail & Related papers (2020-10-22T23:45:34Z) - Back-and-Forth prediction for deep tensor compression [37.663819283148854]
We present a prediction scheme called Back-and-Forth (BaF) prediction, developed for deep feature tensors.
We achieve a 62% and 75% reduction in tensor size while keeping the loss in accuracy of the network to less than 1% and 2%.
arXiv Detail & Related papers (2020-02-14T01:32:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.