DiffRate : Differentiable Compression Rate for Efficient Vision
Transformers
- URL: http://arxiv.org/abs/2305.17997v1
- Date: Mon, 29 May 2023 10:15:19 GMT
- Title: DiffRate : Differentiable Compression Rate for Efficient Vision
Transformers
- Authors: Mengzhao Chen, Wenqi Shao, Peng Xu, Mingbao Lin, Kaipeng Zhang, Fei
Chao, Rongrong Ji, Yu Qiao, Ping Luo
- Abstract summary: Token compression aims to speed up large-scale vision transformers (e.g. ViTs) by pruning (dropping) or merging tokens.
DiffRate is a novel token compression method that has several appealing properties prior arts do not have.
- Score: 98.33906104846386
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Token compression aims to speed up large-scale vision transformers (e.g.
ViTs) by pruning (dropping) or merging tokens. It is an important but
challenging task. Although recent advanced approaches achieved great success,
they need to carefully handcraft a compression rate (i.e. number of tokens to
remove), which is tedious and leads to sub-optimal performance. To tackle this
problem, we propose Differentiable Compression Rate (DiffRate), a novel token
compression method that has several appealing properties prior arts do not
have. First, DiffRate enables propagating the loss function's gradient onto the
compression ratio, which is considered as a non-differentiable hyperparameter
in previous work. In this case, different layers can automatically learn
different compression rates layer-wisely without extra overhead. Second, token
pruning and merging can be naturally performed simultaneously in DiffRate,
while they were isolated in previous works. Third, extensive experiments
demonstrate that DiffRate achieves state-of-the-art performance. For example,
by applying the learned layer-wise compression rates to an off-the-shelf ViT-H
(MAE) model, we achieve a 40% FLOPs reduction and a 1.5x throughput
improvement, with a minor accuracy drop of 0.16% on ImageNet without
fine-tuning, even outperforming previous methods with fine-tuning. Codes and
models are available at https://github.com/OpenGVLab/DiffRate.
Related papers
- Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning [63.43972993473501]
Token compression expedites the training and inference of Vision Transformers (ViTs)
However, when applied to downstream tasks, compression degrees are mismatched between training and inference stages.
We propose a model arithmetic framework to decouple the compression degrees between the two stages.
arXiv Detail & Related papers (2024-08-13T10:36:43Z) - Variable-Rate Learned Image Compression with Multi-Objective
Optimization and Quantization-Reconstruction Offsets [8.670873561640903]
This paper follows the traditional approach to vary a single quantization step size to perform uniform quantization of all latent tensor elements.
Three modifications are proposed to improve the variable rate compression performance.
The achieved variable rate compression results indicate negligible or minimal compression performance loss compared to training multiple models.
arXiv Detail & Related papers (2024-02-29T07:45:02Z) - CAIT: Triple-Win Compression towards High Accuracy, Fast Inference, and
Favorable Transferability For ViTs [79.54107547233625]
Vision Transformers (ViTs) have emerged as state-of-the-art models for various vision tasks.
We propose a joint compression method for ViTs that offers both high accuracy and fast inference speed.
Our proposed method can achieve state-of-the-art performance across various ViTs.
arXiv Detail & Related papers (2023-09-27T16:12:07Z) - Lossy and Lossless (L$^2$) Post-training Model Size Compression [12.926354646945397]
We propose a post-training model size compression method that combines lossy and lossless compression in a unified way.
Our method can achieve a stable $10times$ compression ratio without sacrificing accuracy and a $20times$ compression ratio with minor accuracy loss in a short time.
arXiv Detail & Related papers (2023-08-08T14:10:16Z) - High-Fidelity Variable-Rate Image Compression via Invertible Activation
Transformation [24.379052026260034]
We propose the Invertible Activation Transformation (IAT) module to tackle the issue of high-fidelity fine variable-rate image compression.
IAT and QLevel together give the image compression model the ability of fine variable-rate control while better maintaining the image fidelity.
Our method outperforms the state-of-the-art variable-rate image compression method by a large margin, especially after multiple re-encodings.
arXiv Detail & Related papers (2022-09-12T07:14:07Z) - Unified Multivariate Gaussian Mixture for Efficient Neural Image
Compression [151.3826781154146]
latent variables with priors and hyperpriors is an essential problem in variational image compression.
We find inter-correlations and intra-correlations exist when observing latent variables in a vectorized perspective.
Our model has better rate-distortion performance and an impressive $3.18times$ compression speed up.
arXiv Detail & Related papers (2022-03-21T11:44:17Z) - Unified Visual Transformer Compression [102.26265546836329]
This paper proposes a unified ViT compression framework that seamlessly assembles three effective techniques: pruning, layer skipping, and knowledge distillation.
We formulate a budget-constrained, end-to-end optimization framework, targeting jointly learning model weights, layer-wise pruning ratios/masks, and skip configurations.
Experiments are conducted with several ViT variants, e.g. DeiT and T2T-ViT backbones on the ImageNet dataset, and our approach consistently outperforms recent competitors.
arXiv Detail & Related papers (2022-03-15T20:38:22Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.