Multi-Dimensional Model Compression of Vision Transformer
- URL: http://arxiv.org/abs/2201.00043v1
- Date: Fri, 31 Dec 2021 19:54:18 GMT
- Title: Multi-Dimensional Model Compression of Vision Transformer
- Authors: Zejiang Hou and Sun-Yuan Kung
- Abstract summary: Vision transformers (ViT) have recently attracted considerable attentions, but the huge computational cost remains an issue for practical deployment.
Previous ViT pruning methods tend to prune the model along one dimension solely.
We advocate a multi-dimensional ViT compression paradigm, and propose to harness the redundancy reduction from attention head, neuron and sequence dimensions jointly.
- Score: 21.8311401851523
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Vision transformers (ViT) have recently attracted considerable attentions,
but the huge computational cost remains an issue for practical deployment.
Previous ViT pruning methods tend to prune the model along one dimension
solely, which may suffer from excessive reduction and lead to sub-optimal model
quality. In contrast, we advocate a multi-dimensional ViT compression paradigm,
and propose to harness the redundancy reduction from attention head, neuron and
sequence dimensions jointly. We firstly propose a statistical dependence based
pruning criterion that is generalizable to different dimensions for identifying
deleterious components. Moreover, we cast the multi-dimensional compression as
an optimization, learning the optimal pruning policy across the three
dimensions that maximizes the compressed model's accuracy under a computational
budget. The problem is solved by our adapted Gaussian process search with
expected improvement. Experimental results show that our method effectively
reduces the computational cost of various ViT models. For example, our method
reduces 40\% FLOPs without top-1 accuracy loss for DeiT and T2T-ViT models,
outperforming previous state-of-the-arts.
Related papers
- VTrans: Accelerating Transformer Compression with Variational Information Bottleneck based Pruning [3.256420760342604]
We propose VTrans, an iterative pruning framework guided by the Variational Information Bottleneck (VIB) principle.
Our method compresses all structural components, including embeddings, attention heads, and layers using VIB-trained masks.
Notably, our method achieves upto 70% more compression than prior state-of-the-art approaches.
arXiv Detail & Related papers (2024-06-07T22:07:46Z) - Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank
Compression Strategy [5.699098817569033]
This paper introduces an activation-aware model compression methodology that uses selective low-rank weight tensor approximations of different layers to reduce the parameter count of ViTs.
The presented method significantly reduces the parameter count of DeiT-B by 60% with less than 1% accuracy drop on the ImageNet dataset.
In addition to this, the presented compression technique can compress large DeiT/ViT models to have about the same model size as smaller DeiT/ViT variants while yielding up to 1.8% accuracy gain.
arXiv Detail & Related papers (2024-02-08T19:01:14Z) - GQKVA: Efficient Pre-training of Transformers by Grouping Queries, Keys,
and Values [3.960622297616708]
GQKVA is designed to speed up transformer pre-training while reducing the model size.
Our experiments with various GQKVA variants highlight a clear trade-off between performance and model size.
arXiv Detail & Related papers (2023-11-06T17:29:24Z) - COMCAT: Towards Efficient Compression and Customization of
Attention-Based Vision Models [21.07857091998763]
This paper explores an efficient method for compressing vision transformers to enrich the toolset for obtaining compact attention-based vision models.
For compressing DeiT-small and DeiT-base models on ImageNet, our proposed approach can achieve 0.45% and 0.76% higher top-1 accuracy even with fewer parameters.
arXiv Detail & Related papers (2023-05-26T19:50:00Z) - Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design [84.34416126115732]
Scaling laws have been recently employed to derive compute-optimal model size (number of parameters) for a given compute duration.
We advance and refine such methods to infer compute-optimal model shapes, such as width and depth, and successfully implement this in vision transformers.
Our shape-optimized vision transformer, SoViT, achieves results competitive with models that exceed twice its size, despite being pre-trained with an equivalent amount of compute.
arXiv Detail & Related papers (2023-05-22T13:39:28Z) - GOHSP: A Unified Framework of Graph and Optimization-based Heterogeneous
Structured Pruning for Vision Transformer [76.2625311630021]
Vision transformers (ViTs) have shown very impressive empirical performance in various computer vision tasks.
To mitigate this challenging problem, structured pruning is a promising solution to compress model size and enable practical efficiency.
We propose GOHSP, a unified framework of Graph and Optimization-based Structured Pruning for ViT models.
arXiv Detail & Related papers (2023-01-13T00:40:24Z) - Numerical Optimizations for Weighted Low-rank Estimation on Language
Model [73.12941276331316]
Singular value decomposition (SVD) is one of the most popular compression methods that approximates a target matrix with smaller matrices.
Standard SVD treats the parameters within the matrix with equal importance, which is a simple but unrealistic assumption.
We show that our method can perform better than current SOTA methods in neural-based language models.
arXiv Detail & Related papers (2022-11-02T00:58:02Z) - Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer [56.87383229709899]
We develop an information rectification module (IRM) and a distribution guided distillation scheme for fully quantized vision transformers (Q-ViT)
Our method achieves a much better performance than the prior arts.
arXiv Detail & Related papers (2022-10-13T04:00:29Z) - Language model compression with weighted low-rank factorization [73.61874728240568]
We introduce Fisher information to weigh the importance of parameters affecting the model prediction.
We find that our resulting task accuracy is much closer to the original model's performance.
Our method can directly compress a task-specific model while achieving better performance than other compact model strategies.
arXiv Detail & Related papers (2022-06-30T21:57:07Z) - Global Vision Transformer Pruning with Hessian-Aware Saliency [93.33895899995224]
This work challenges the common design philosophy of the Vision Transformer (ViT) model with uniform dimension across all the stacked blocks in a model stage.
We derive a novel Hessian-based structural pruning criteria comparable across all layers and structures, with latency-aware regularization for direct latency reduction.
Performing iterative pruning on the DeiT-Base model leads to a new architecture family called NViT (Novel ViT), with a novel parameter that utilizes parameters more efficiently.
arXiv Detail & Related papers (2021-10-10T18:04:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.