Vision Transformer Compression with Structured Pruning and Low Rank
Approximation
- URL: http://arxiv.org/abs/2203.13444v1
- Date: Fri, 25 Mar 2022 04:18:07 GMT
- Title: Vision Transformer Compression with Structured Pruning and Low Rank
Approximation
- Authors: Ankur Kumar
- Abstract summary: Transformer architecture has gained popularity due to its ability to scale with large dataset.
We focus on vision transformer proposed for image recognition task.
We explore the application of different compression techniques such as low rank approximation and pruning for this purpose.
- Score: 1.9685957565449135
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer architecture has gained popularity due to its ability to scale
with large dataset. Consequently, there is a need to reduce the model size and
latency, especially for on-device deployment. We focus on vision transformer
proposed for image recognition task (Dosovitskiy et al., 2021), and explore the
application of different compression techniques such as low rank approximation
and pruning for this purpose. Specifically, we investigate a structured pruning
method proposed recently in Zhu et al. (2021) and find that mostly feedforward
blocks are pruned with this approach, that too, with severe degradation in
accuracy. We propose a hybrid compression approach to mitigate this where we
compress the attention blocks using low rank approximation and use the
previously mentioned pruning with a lower rate for feedforward blocks in each
transformer layer. Our technique results in 50% compression with 14% relative
increase in classification error whereas we obtain 44% compression with 20%
relative increase in error when only pruning is applied. We propose further
enhancements to bridge the accuracy gap but leave it as a future work.
Related papers
- Beyond Throughput and Compression Ratios: Towards High End-to-end Utility of Gradient Compression [13.255861297820326]
gradient compression can reduce communicated gradient data volume.
In practice, gradient compression schemes do not achieve acceleration of the training process while also preserving accuracy.
We identify common issues in previous gradient compression systems and evaluation methodologies.
arXiv Detail & Related papers (2024-07-01T15:32:28Z) - Compression of Structured Data with Autoencoders: Provable Benefit of
Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input.
We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture.
We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z) - DiffRate : Differentiable Compression Rate for Efficient Vision
Transformers [98.33906104846386]
Token compression aims to speed up large-scale vision transformers (e.g. ViTs) by pruning (dropping) or merging tokens.
DiffRate is a novel token compression method that has several appealing properties prior arts do not have.
arXiv Detail & Related papers (2023-05-29T10:15:19Z) - Estimating the Resize Parameter in End-to-end Learned Image Compression [50.20567320015102]
We describe a search-free resizing framework that can further improve the rate-distortion tradeoff of recent learned image compression models.
Our results show that our new resizing parameter estimation framework can provide Bjontegaard-Delta rate (BD-rate) improvement of about 10% against leading perceptual quality engines.
arXiv Detail & Related papers (2022-04-26T01:35:02Z) - Implicit Neural Representations for Image Compression [103.78615661013623]
Implicit Neural Representations (INRs) have gained attention as a novel and effective representation for various data types.
We propose the first comprehensive compression pipeline based on INRs including quantization, quantization-aware retraining and entropy coding.
We find that our approach to source compression with INRs vastly outperforms similar prior work.
arXiv Detail & Related papers (2021-12-08T13:02:53Z) - Compression-aware Projection with Greedy Dimension Reduction for
Convolutional Neural Network Activations [3.6188659868203388]
We propose a compression-aware projection system to improve the trade-off between classification accuracy and compression ratio.
Our test results show that the proposed methods effectively reduce 2.91x5.97x memory access with negligible accuracy drop on MobileNetV2/ResNet18/VGG16.
arXiv Detail & Related papers (2021-10-17T14:02:02Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z) - Compressed Communication for Distributed Training: Adaptive Methods and
System [13.244482588437972]
Communication overhead severely hinders the scalability of distributed machine learning systems.
Recently, there has been a growing interest in using gradient compression to reduce the communication overhead.
In this paper, we first introduce a novel adaptive gradient method with gradient compression.
arXiv Detail & Related papers (2021-05-17T13:41:47Z) - ScaleCom: Scalable Sparsified Gradient Compression for
Communication-Efficient Distributed Training [74.43625662170284]
Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained.
We propose a new compression technique that leverages similarity in the gradient distribution amongst learners to provide significantly improved scalability.
We experimentally demonstrate that ScaleCom has small overheads, directly reduces gradient traffic and provides high compression rates (65-400X) and excellent scalability (up to 64 learners and 8-12X larger batch sizes over standard training) without significant accuracy loss.
arXiv Detail & Related papers (2021-04-21T02:22:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.