Related papers: LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise Relevance Propagation

LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise Relevance Propagation

URL: http://arxiv.org/abs/2401.11243v1
Date: Sat, 20 Jan 2024 14:53:19 GMT
Title: LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise Relevance Propagation
Authors: Navin Ranjan and Andreas Savakis
Abstract summary: We introduce LRP-QViT, an explainability-based method for assigning mixed-precision bit allocations to different layers based on their importance during classification. Our experimental findings demonstrate that both our fixed-bit and mixed-bit post-training quantization methods surpass existing models in the context of 4-bit and 6-bit quantization.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision transformers (ViTs) have demonstrated remarkable performance across various visual tasks. However, ViT models suffer from substantial computational and memory requirements, making it challenging to deploy them on resource-constrained platforms. Quantization is a popular approach for reducing model size, but most studies mainly focus on equal bit-width quantization for the entire network, resulting in sub-optimal solutions. While there are few works on mixed precision quantization (MPQ) for ViTs, they typically rely on search space-based methods or employ mixed precision arbitrarily. In this paper, we introduce LRP-QViT, an explainability-based method for assigning mixed-precision bit allocations to different layers based on their importance during classification. Specifically, to measure the contribution score of each layer in predicting the target class, we employ the Layer-wise Relevance Propagation (LRP) method. LRP assigns local relevance at the output layer and propagates it through all layers, distributing the relevance until it reaches the input layers. These relevance scores serve as indicators for computing the layer contribution score. Additionally, we have introduced a clipped channel-wise quantization aimed at eliminating outliers from post-LayerNorm activations to alleviate severe inter-channel variations. To validate and assess our approach, we employ LRP-QViT across ViT, DeiT, and Swin transformer models on various datasets. Our experimental findings demonstrate that both our fixed-bit and mixed-bit post-training quantization methods surpass existing models in the context of 4-bit and 6-bit quantization.

Related papers

Layer-wise Quantization for Quantized Optimistic Dual Averaging [75.4148236967503]
We develop a general layer-wise quantization framework with tight variance and code-length bounds, adapting to the heterogeneities over the course of training.<n>We propose a novel Quantized Optimistic Dual Averaging (QODA) algorithm with adaptive learning rates, which achieves competitive convergence rates for monotone VIs.
arXiv Detail & Related papers (2025-05-20T13:53:58Z)
Mix-QSAM: Mixed-Precision Quantization of the Segment Anything Model [0.0]
Mix-QSAM is a mixed-precision Post-Training Quantization (PTQ) framework for the Segment Anything Model (SAM)<n>We introduce a layer-wise importance score, derived using Kullback-Leibler (KL) divergence, to quantify each layer's contribution to the model's output.<n>We also introduce cross-layer synergy, a novel metric based on causal mutual information, to capture dependencies between adjacent layers.
arXiv Detail & Related papers (2025-05-08T00:08:31Z)
Toward Relative Positional Encoding in Spiking Transformers [52.62008099390541]
Spiking neural networks (SNNs) are bio-inspired networks that model how neurons in the brain communicate through discrete spikes. In this paper, we introduce an approximate method for relative positional encoding (RPE) in Spiking Transformers.
arXiv Detail & Related papers (2025-01-28T06:42:37Z)
Mix-QViT: Mixed-Precision Vision Transformer Quantization Driven by Layer Importance and Quantization Sensitivity [0.0]
Mix-QViT is an explainability-driven MPQ framework that allocates bit-widths to each layer based on two criteria. For post-training quantization, we introduce a clipped channel-wise quantization method.
arXiv Detail & Related papers (2025-01-10T21:36:20Z)
Quantized and Interpretable Learning Scheme for Deep Neural Networks in Classification Task [0.0]
We introduce an approach that combines saliency-guided training with quantization techniques to create an interpretable and resource-efficient model. Our results demonstrate that the combined use of saliency-guided training and PACT-based quantization not only maintains classification performance but also produces models that are significantly more efficient and interpretable.
arXiv Detail & Related papers (2024-12-05T06:34:06Z)
AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer [54.713778961605115]
Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks in the computer vision community. We propose a novel non-uniform quantizer, dubbed the Adaptive Logarithm AdaLog (AdaLog) quantizer.
arXiv Detail & Related papers (2024-07-17T18:38:48Z)
CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs [6.456189487006878]
We present CLAMP-ViT, a data-free post-training quantization method for vision transformers (ViTs) We identify the limitations of recent techniques, notably their inability to leverage meaningful inter-patch relationships. CLAMP-ViT employs a two-stage approach, cyclically adapting between data generation and model quantization.
arXiv Detail & Related papers (2024-07-07T05:39:25Z)
QGait: Toward Accurate Quantization for Gait Recognition with Binarized Input [17.017127559393398]
We propose a differentiable soft quantizer, which better simulates the gradient of the round function during backpropagation. This enables the network to learn from subtle input perturbations. We further refine the training strategy to ensure convergence while simulating quantization errors.
arXiv Detail & Related papers (2024-05-22T17:34:18Z)
Instance-Aware Group Quantization for Vision Transformers [20.105148326987646]
Post-training quantization (PTQ) is an efficient model compression technique that quantizes a pretrained full-precision model. PTQ methods for convolutional neural networks (CNNs) provide quantization results comparable to full-precision counterparts. We introduce instance-aware group quantization for ViTs (IGQ-ViT)
arXiv Detail & Related papers (2024-04-01T05:12:30Z)
Layer-wise Feedback Propagation [53.00944147633484]
We present Layer-wise Feedback Propagation (LFP), a novel training approach for neural-network-like predictors. LFP assigns rewards to individual connections based on their respective contributions to solving a given task. We demonstrate its effectiveness in achieving comparable performance to gradient descent on various models and datasets.
arXiv Detail & Related papers (2023-08-23T10:48:28Z)
ViT-Calibrator: Decision Stream Calibration for Vision Transformer [49.60474757318486]
We propose a new paradigm dubbed Decision Stream that boosts the performance of general Vision Transformers. We shed light on the information propagation mechanism in the learning procedure by exploring the correlation between different tokens and the relevance coefficient of multiple dimensions.
arXiv Detail & Related papers (2023-04-10T02:40:24Z)
BiTAT: Neural Network Binarization with Task-dependent Aggregated Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation. Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration. This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z)
ClusterQ: Semantic Feature Distribution Alignment for Data-Free Quantization [111.12063632743013]
We propose a new and effective data-free quantization method termed ClusterQ. To obtain high inter-class separability of semantic features, we cluster and align the feature distribution statistics. We also incorporate the intra-class variance to solve class-wise mode collapse.
arXiv Detail & Related papers (2022-04-30T06:58:56Z)
SPIQ: Data-Free Per-Channel Static Input Quantization [37.82255888371488]
Methods for efficient inference have drawn a growing attention in the machine learning community. In this work, we argue that static input quantization can reach the accuracy levels of dynamic methods by means of a per-channel input quantization scheme. We show through a thorough empirical evaluation on multiple computer vision problems that the proposed method, dubbed SPIQ, achieves accuracies rivalling dynamic approaches with static-level inference speed.
arXiv Detail & Related papers (2022-03-28T10:59:18Z)
Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model. This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs) The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.