MGRQ: Post-Training Quantization For Vision Transformer With Mixed Granularity Reconstruction
- URL: http://arxiv.org/abs/2406.09229v1
- Date: Thu, 13 Jun 2024 15:29:37 GMT
- Title: MGRQ: Post-Training Quantization For Vision Transformer With Mixed Granularity Reconstruction
- Authors: Lianwei Yang, Zhikai Li, Junrui Xiao, Haisong Gong, Qingyi Gu,
- Abstract summary: Post-training quantization (PTQ) efficiently compresses vision models.
Efforts to significantly improve the performance of PTQ through reconstruction in the Vision Transformer (ViT) have shown limited efficacy.
We propose MGRQ (Mixed Granularity Reconstruction Quantization) as a solution to address this issue.
- Score: 3.7024647541541014
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Post-training quantization (PTQ) efficiently compresses vision models, but unfortunately, it accompanies a certain degree of accuracy degradation. Reconstruction methods aim to enhance model performance by narrowing the gap between the quantized model and the full-precision model, often yielding promising results. However, efforts to significantly improve the performance of PTQ through reconstruction in the Vision Transformer (ViT) have shown limited efficacy. In this paper, we conduct a thorough analysis of the reasons for this limited effectiveness and propose MGRQ (Mixed Granularity Reconstruction Quantization) as a solution to address this issue. Unlike previous reconstruction schemes, MGRQ introduces a mixed granularity reconstruction approach. Specifically, MGRQ enhances the performance of PTQ by introducing Extra-Block Global Supervision and Intra-Block Local Supervision, building upon Optimized Block-wise Reconstruction. Extra-Block Global Supervision considers the relationship between block outputs and the model's output, aiding block-wise reconstruction through global supervision. Meanwhile, Intra-Block Local Supervision reduces generalization errors by aligning the distribution of outputs at each layer within a block. Subsequently, MGRQ is further optimized for reconstruction through Mixed Granularity Loss Fusion. Extensive experiments conducted on various ViT models illustrate the effectiveness of MGRQ. Notably, MGRQ demonstrates robust performance in low-bit quantization, thereby enhancing the practicality of the quantized model.
Related papers
- Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization [0.0]
Post-training quantization has emerged as a widely used technique for compressing large language models (LLMs) without retraining.
The accumulation of quantization errors across layers significantly degrades performance, particularly in low-bit regimes.
We propose Quantization Error propagation (QEP), a lightweight and general framework that enhances layer-wise PTQ by explicitly propagating the quantization error.
arXiv Detail & Related papers (2025-04-13T15:56:00Z) - APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers [71.2294205496784]
We propose textbfAPHQ-ViT, a novel PTQ approach based on importance estimation with Average Perturbation Hessian (APH)
We show that APHQ-ViT using linear quantizers outperforms existing PTQ methods by substantial margins in 3-bit and 4-bit across different vision tasks.
arXiv Detail & Related papers (2025-04-03T11:48:56Z) - Model Hemorrhage and the Robustness Limits of Large Language Models [119.46442117681147]
Large language models (LLMs) demonstrate strong performance across natural language processing tasks, yet undergo significant performance degradation when modified for deployment.
We define this phenomenon as model hemorrhage - performance decline caused by parameter alterations and architectural changes.
arXiv Detail & Related papers (2025-03-31T10:16:03Z) - UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity [79.90839080916913]
We present our UniRestorer with improved restoration performance.
Specifically, we perform hierarchical clustering on degradation space, and train a multi-granularity mixture-of-experts (MoE) restoration model.
In contrast to existing degradation-agnostic and -aware methods, UniRestorer can leverage degradation estimation to benefit degradation specific restoration.
arXiv Detail & Related papers (2024-12-28T14:09:08Z) - Progressive Fine-to-Coarse Reconstruction for Accurate Low-Bit Post-Training Quantization in Vision Transformers [13.316135182889296]
Post-Training Quantization (PTQ) has been widely adopted for compressing Vision Transformers (ViTs)
When quantized into low-bit representations, there is often a significant performance drop compared to their full-precision counterparts.
We propose a Progressive Fine-to-Coarse Reconstruction (PFCR) method for accurate PTQ, which significantly improves the performance of low-bit quantized vision transformers.
arXiv Detail & Related papers (2024-12-19T08:38:59Z) - DR-BFR: Degradation Representation with Diffusion Models for Blind Face Restoration [7.521850476177286]
We equip diffusion models with the capability to decouple various degradation as a degradation prompt from low-quality (LQ) face images.
Our novel restoration scheme, named DR-BFR, guides the denoising of Latent Diffusion Models (LDM) by incorporating Degradation Representation (DR) and content features from LQ images.
DR-BFR significantly outperforms state-of-the-art methods quantitatively and qualitatively across various datasets.
arXiv Detail & Related papers (2024-11-15T15:24:42Z) - SurgeryV2: Bridging the Gap Between Model Merging and Multi-Task Learning with Deep Representation Surgery [54.866490321241905]
Model merging-based multitask learning (MTL) offers a promising approach for performing MTL by merging multiple expert models.
In this paper, we examine the merged model's representation distribution and uncover a critical issue of "representation bias"
This bias arises from a significant distribution gap between the representations of the merged and expert models, leading to the suboptimal performance of the merged MTL model.
arXiv Detail & Related papers (2024-10-18T11:49:40Z) - GLMHA A Guided Low-rank Multi-Head Self-Attention for Efficient Image Restoration and Spectral Reconstruction [36.23508672036131]
We propose an instance-Guided Low-rank Multi-Head selfattention to replace the Channel-wise Self-Attention.
Unique to the proposed GLMHA is its ability to provide computational gain for both short and long input sequences.
Our results show up to a 7.7 Giga FLOPs reduction with 370K fewer parameters required to closely retain the original performance of the best-performing models.
arXiv Detail & Related papers (2024-10-01T04:07:48Z) - DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers [2.0862654518798034]
We propose a Distribution-Friendly and Outlier-Aware Post-training Quantization method for Vision Transformers.
DopQ-ViT analyzes the inefficiencies of current quantizers and introduces a distribution-friendly Tan Quantizer called TanQ.
DopQ-ViT has been extensively validated and significantly improves the performance of quantization models.
arXiv Detail & Related papers (2024-08-06T16:40:04Z) - Boosting Image Restoration via Priors from Pre-trained Models [54.83907596825985]
We learn an additional lightweight module called Pre-Train-Guided Refinement Module (PTG-RM) to refine restoration results of a target restoration network with OSF.
PTG-RM effectively enhances restoration performance of various models across different tasks, including low-light enhancement, deraining, deblurring, and denoising.
arXiv Detail & Related papers (2024-03-11T15:11:57Z) - Parameter Efficient Adaptation for Image Restoration with Heterogeneous Mixture-of-Experts [52.39959535724677]
We introduce an alternative solution to improve the generalization of image restoration models.
We propose AdaptIR, a Mixture-of-Experts (MoE) with multi-branch design to capture local, global, and channel representation bases.
Our AdaptIR achieves stable performance on single-degradation tasks, and excels in hybrid-degradation tasks, with fine-tuning only 0.6% parameters for 8 hours.
arXiv Detail & Related papers (2023-12-12T14:27:59Z) - PGDiff: Guiding Diffusion Models for Versatile Face Restoration via
Partial Guidance [65.5618804029422]
Previous works have achieved noteworthy success by limiting the solution space using explicit degradation models.
We propose PGDiff by introducing partial guidance, a fresh perspective that is more adaptable to real-world degradations.
Our method not only outperforms existing diffusion-prior-based approaches but also competes favorably with task-specific models.
arXiv Detail & Related papers (2023-09-19T17:51:33Z) - A Unified Conditional Framework for Diffusion-based Image Restoration [39.418415473235235]
We present a unified conditional framework based on diffusion models for image restoration.
We leverage a lightweight UNet to predict initial guidance and the diffusion model to learn the residual of the guidance.
To handle high-resolution images, we propose a simple yet effective inter-step patch-splitting strategy.
arXiv Detail & Related papers (2023-05-31T17:22:24Z) - Implicit Diffusion Models for Continuous Super-Resolution [65.45848137914592]
This paper introduces an Implicit Diffusion Model (IDM) for high-fidelity continuous image super-resolution.
IDM integrates an implicit neural representation and a denoising diffusion model in a unified end-to-end framework.
The scaling factor regulates the resolution and accordingly modulates the proportion of the LR information and generated features in the final output.
arXiv Detail & Related papers (2023-03-29T07:02:20Z) - GCVAE: Generalized-Controllable Variational AutoEncoder [0.0]
We present a framework to handle the trade-off between attaining extremely low reconstruction error and a high disentanglement score.
We prove that maximizing information in the reconstruction network is equivalent to information during amortized inference.
arXiv Detail & Related papers (2022-06-09T02:29:30Z) - Attentive Fine-Grained Structured Sparsity for Image Restoration [63.35887911506264]
N:M structured pruning has appeared as one of the effective and practical pruning approaches for making the model efficient with the accuracy constraint.
We propose a novel pruning method that determines the pruning ratio for N:M structured sparsity at each layer.
arXiv Detail & Related papers (2022-04-26T12:44:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.