LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise
Relevance Propagation
- URL: http://arxiv.org/abs/2401.11243v1
- Date: Sat, 20 Jan 2024 14:53:19 GMT
- Title: LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise
Relevance Propagation
- Authors: Navin Ranjan and Andreas Savakis
- Abstract summary: We introduce LRP-QViT, an explainability-based method for assigning mixed-precision bit allocations to different layers based on their importance during classification.
Our experimental findings demonstrate that both our fixed-bit and mixed-bit post-training quantization methods surpass existing models in the context of 4-bit and 6-bit quantization.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision transformers (ViTs) have demonstrated remarkable performance across
various visual tasks. However, ViT models suffer from substantial computational
and memory requirements, making it challenging to deploy them on
resource-constrained platforms. Quantization is a popular approach for reducing
model size, but most studies mainly focus on equal bit-width quantization for
the entire network, resulting in sub-optimal solutions. While there are few
works on mixed precision quantization (MPQ) for ViTs, they typically rely on
search space-based methods or employ mixed precision arbitrarily. In this
paper, we introduce LRP-QViT, an explainability-based method for assigning
mixed-precision bit allocations to different layers based on their importance
during classification. Specifically, to measure the contribution score of each
layer in predicting the target class, we employ the Layer-wise Relevance
Propagation (LRP) method. LRP assigns local relevance at the output layer and
propagates it through all layers, distributing the relevance until it reaches
the input layers. These relevance scores serve as indicators for computing the
layer contribution score. Additionally, we have introduced a clipped
channel-wise quantization aimed at eliminating outliers from post-LayerNorm
activations to alleviate severe inter-channel variations. To validate and
assess our approach, we employ LRP-QViT across ViT, DeiT, and Swin transformer
models on various datasets. Our experimental findings demonstrate that both our
fixed-bit and mixed-bit post-training quantization methods surpass existing
models in the context of 4-bit and 6-bit quantization.
Related papers
- Toward Relative Positional Encoding in Spiking Transformers [52.62008099390541]
Spiking neural networks (SNNs) are bio-inspired networks that model how neurons in the brain communicate through discrete spikes.
In this paper, we introduce an approximate method for relative positional encoding (RPE) in Spiking Transformers.
arXiv Detail & Related papers (2025-01-28T06:42:37Z) - Mix-QViT: Mixed-Precision Vision Transformer Quantization Driven by Layer Importance and Quantization Sensitivity [0.0]
Mix-QViT is an explainability-driven MPQ framework that allocates bit-widths to each layer based on two criteria.
For post-training quantization, we introduce a clipped channel-wise quantization method.
arXiv Detail & Related papers (2025-01-10T21:36:20Z) - Quantized and Interpretable Learning Scheme for Deep Neural Networks in Classification Task [0.0]
We introduce an approach that combines saliency-guided training with quantization techniques to create an interpretable and resource-efficient model.
Our results demonstrate that the combined use of saliency-guided training and PACT-based quantization not only maintains classification performance but also produces models that are significantly more efficient and interpretable.
arXiv Detail & Related papers (2024-12-05T06:34:06Z) - AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer [54.713778961605115]
Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks in the computer vision community.
We propose a novel non-uniform quantizer, dubbed the Adaptive Logarithm AdaLog (AdaLog) quantizer.
arXiv Detail & Related papers (2024-07-17T18:38:48Z) - QGait: Toward Accurate Quantization for Gait Recognition with Binarized Input [17.017127559393398]
We propose a differentiable soft quantizer, which better simulates the gradient of the round function during backpropagation.
This enables the network to learn from subtle input perturbations.
We further refine the training strategy to ensure convergence while simulating quantization errors.
arXiv Detail & Related papers (2024-05-22T17:34:18Z) - Instance-Aware Group Quantization for Vision Transformers [20.105148326987646]
Post-training quantization (PTQ) is an efficient model compression technique that quantizes a pretrained full-precision model.
PTQ methods for convolutional neural networks (CNNs) provide quantization results comparable to full-precision counterparts.
We introduce instance-aware group quantization for ViTs (IGQ-ViT)
arXiv Detail & Related papers (2024-04-01T05:12:30Z) - ViT-Calibrator: Decision Stream Calibration for Vision Transformer [49.60474757318486]
We propose a new paradigm dubbed Decision Stream that boosts the performance of general Vision Transformers.
We shed light on the information propagation mechanism in the learning procedure by exploring the correlation between different tokens and the relevance coefficient of multiple dimensions.
arXiv Detail & Related papers (2023-04-10T02:40:24Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - ClusterQ: Semantic Feature Distribution Alignment for Data-Free
Quantization [111.12063632743013]
We propose a new and effective data-free quantization method termed ClusterQ.
To obtain high inter-class separability of semantic features, we cluster and align the feature distribution statistics.
We also incorporate the intra-class variance to solve class-wise mode collapse.
arXiv Detail & Related papers (2022-04-30T06:58:56Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.