LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise
Relevance Propagation
- URL: http://arxiv.org/abs/2401.11243v1
- Date: Sat, 20 Jan 2024 14:53:19 GMT
- Title: LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise
Relevance Propagation
- Authors: Navin Ranjan and Andreas Savakis
- Abstract summary: We introduce LRP-QViT, an explainability-based method for assigning mixed-precision bit allocations to different layers based on their importance during classification.
Our experimental findings demonstrate that both our fixed-bit and mixed-bit post-training quantization methods surpass existing models in the context of 4-bit and 6-bit quantization.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision transformers (ViTs) have demonstrated remarkable performance across
various visual tasks. However, ViT models suffer from substantial computational
and memory requirements, making it challenging to deploy them on
resource-constrained platforms. Quantization is a popular approach for reducing
model size, but most studies mainly focus on equal bit-width quantization for
the entire network, resulting in sub-optimal solutions. While there are few
works on mixed precision quantization (MPQ) for ViTs, they typically rely on
search space-based methods or employ mixed precision arbitrarily. In this
paper, we introduce LRP-QViT, an explainability-based method for assigning
mixed-precision bit allocations to different layers based on their importance
during classification. Specifically, to measure the contribution score of each
layer in predicting the target class, we employ the Layer-wise Relevance
Propagation (LRP) method. LRP assigns local relevance at the output layer and
propagates it through all layers, distributing the relevance until it reaches
the input layers. These relevance scores serve as indicators for computing the
layer contribution score. Additionally, we have introduced a clipped
channel-wise quantization aimed at eliminating outliers from post-LayerNorm
activations to alleviate severe inter-channel variations. To validate and
assess our approach, we employ LRP-QViT across ViT, DeiT, and Swin transformer
models on various datasets. Our experimental findings demonstrate that both our
fixed-bit and mixed-bit post-training quantization methods surpass existing
models in the context of 4-bit and 6-bit quantization.
Related papers
- AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer [54.713778961605115]
Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks in the computer vision community.
We propose a novel non-uniform quantizer, dubbed the Adaptive Logarithm AdaLog (AdaLog) quantizer.
arXiv Detail & Related papers (2024-07-17T18:38:48Z) - CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs [6.456189487006878]
We present CLAMP-ViT, a data-free post-training quantization method for vision transformers (ViTs)
We identify the limitations of recent techniques, notably their inability to leverage meaningful inter-patch relationships.
CLAMP-ViT employs a two-stage approach, cyclically adapting between data generation and model quantization.
arXiv Detail & Related papers (2024-07-07T05:39:25Z) - QGait: Toward Accurate Quantization for Gait Recognition with Binarized Input [17.017127559393398]
We propose a differentiable soft quantizer, which better simulates the gradient of the round function during backpropagation.
This enables the network to learn from subtle input perturbations.
We further refine the training strategy to ensure convergence while simulating quantization errors.
arXiv Detail & Related papers (2024-05-22T17:34:18Z) - Instance-Aware Group Quantization for Vision Transformers [20.105148326987646]
Post-training quantization (PTQ) is an efficient model compression technique that quantizes a pretrained full-precision model.
PTQ methods for convolutional neural networks (CNNs) provide quantization results comparable to full-precision counterparts.
We introduce instance-aware group quantization for ViTs (IGQ-ViT)
arXiv Detail & Related papers (2024-04-01T05:12:30Z) - Layer-wise Feedback Propagation [53.00944147633484]
We present Layer-wise Feedback Propagation (LFP), a novel training approach for neural-network-like predictors.
LFP assigns rewards to individual connections based on their respective contributions to solving a given task.
We demonstrate its effectiveness in achieving comparable performance to gradient descent on various models and datasets.
arXiv Detail & Related papers (2023-08-23T10:48:28Z) - ViT-Calibrator: Decision Stream Calibration for Vision Transformer [49.60474757318486]
We propose a new paradigm dubbed Decision Stream that boosts the performance of general Vision Transformers.
We shed light on the information propagation mechanism in the learning procedure by exploring the correlation between different tokens and the relevance coefficient of multiple dimensions.
arXiv Detail & Related papers (2023-04-10T02:40:24Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - ClusterQ: Semantic Feature Distribution Alignment for Data-Free
Quantization [111.12063632743013]
We propose a new and effective data-free quantization method termed ClusterQ.
To obtain high inter-class separability of semantic features, we cluster and align the feature distribution statistics.
We also incorporate the intra-class variance to solve class-wise mode collapse.
arXiv Detail & Related papers (2022-04-30T06:58:56Z) - SPIQ: Data-Free Per-Channel Static Input Quantization [37.82255888371488]
Methods for efficient inference have drawn a growing attention in the machine learning community.
In this work, we argue that static input quantization can reach the accuracy levels of dynamic methods by means of a per-channel input quantization scheme.
We show through a thorough empirical evaluation on multiple computer vision problems that the proposed method, dubbed SPIQ, achieves accuracies rivalling dynamic approaches with static-level inference speed.
arXiv Detail & Related papers (2022-03-28T10:59:18Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.