Related papers: Mix-QSAM: Mixed-Precision Quantization of the Segment Anything Model

Mix-QSAM: Mixed-Precision Quantization of the Segment Anything Model

URL: http://arxiv.org/abs/2505.04861v1
Date: Thu, 08 May 2025 00:08:31 GMT
Title: Mix-QSAM: Mixed-Precision Quantization of the Segment Anything Model
Authors: Navin Ranjan, Andreas Savakis,
Abstract summary: Mix-QSAM is a mixed-precision Post-Training Quantization (PTQ) framework for the Segment Anything Model (SAM)<n>We introduce a layer-wise importance score, derived using Kullback-Leibler (KL) divergence, to quantify each layer's contribution to the model's output.<n>We also introduce cross-layer synergy, a novel metric based on causal mutual information, to capture dependencies between adjacent layers.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Segment Anything Model (SAM) is a popular vision foundation model; however, its high computational and memory demands make deployment on resource-constrained devices challenging. While Post-Training Quantization (PTQ) is a practical approach for reducing computational overhead, existing PTQ methods rely on fixed bit-width quantization, leading to suboptimal accuracy and efficiency. To address this limitation, we propose Mix-QSAM, a mixed-precision PTQ framework for SAM. First, we introduce a layer-wise importance score, derived using Kullback-Leibler (KL) divergence, to quantify each layer's contribution to the model's output. Second, we introduce cross-layer synergy, a novel metric based on causal mutual information, to capture dependencies between adjacent layers. This ensures that highly interdependent layers maintain similar bit-widths, preventing abrupt precision mismatches that degrade feature propagation and numerical stability. Using these metrics, we formulate an Integer Quadratic Programming (IQP) problem to determine optimal bit-width allocation under model size and bit-operation constraints, assigning higher precision to critical layers while minimizing bit-width in less influential layers. Experimental results demonstrate that Mix-QSAM consistently outperforms existing PTQ methods on instance segmentation and object detection tasks, achieving up to 20% higher average precision under 6-bit and 4-bit mixed-precision settings, while maintaining computational efficiency.

Related papers

FLRQ: Faster LLM Quantization with Flexible Low-Rank Matrix Sketching [4.01326804806241]
We introduce Rank1-Sketch-based Flexible Rank Selection (R1-FLR) and Best Low-rank Approximation under Clipping (BLC)<n>R1-FLR applies the R1-Sketch with Gaussian projection for the fast low-rank approximation, enabling outlier-aware rank extraction for each layer.<n>BLC aims at minimizing the low-rank quantization error under the scaling and clipping strategy.
arXiv Detail & Related papers (2026-01-09T10:06:45Z)
Compute-Optimal Quantization-Aware Training [50.98555000360485]
Quantization-aware training (QAT) is a leading technique for improving the accuracy of quantized neural networks.<n>Previous work has shown that decomposing training into a full-precision (FP) phase followed by a QAT phase yields superior accuracy.<n>We investigate how different QAT durations impact final performance.
arXiv Detail & Related papers (2025-09-26T21:09:54Z)
IMPQ: Interaction-Aware Layerwise Mixed Precision Quantization for LLMs [4.655407920049974]
Mixed-precision quantization offers a compelling solution, but existing methods struggle when the average precision drops below four bits.<n>In this paper, we propose two innovations to address these limitations.<n>First, we frame the mixed-precision quantization problem as a cooperative game among layers and introduce Shapley-based Progressive Quantization Estimation (SPQE)<n>Second, we propose Interaction-aware Mixed-Precision Quantization (IMPQ) which translates these Shapley estimates into a binary quadratic optimization formulation.
arXiv Detail & Related papers (2025-09-18T21:59:40Z)
QuantVSR: Low-Bit Post-Training Quantization for Real-World Video Super-Resolution [53.13952833016505]
We propose a low-bit quantization model for real-world video super-resolution (VSR)<n>We use a calibration dataset to measure both spatial and temporal complexity for each layer.<n>We refine the FP and low-bit branches to achieve simultaneous optimization.
arXiv Detail & Related papers (2025-08-06T14:35:59Z)
MSQ: Memory-Efficient Bit Sparsification Quantization [11.510434574824213]
Mixed-precision quantization is widely favored, as it offers a superior balance between efficiency and accuracy.<n>We propose Memory-Efficient Bit Sparsification Quantization (MSQ), a novel approach that addresses these limitations.<n>MSQ achieves up to 8.00x reduction in trainable parameters and up to 86% reduction in training time compared to previous bit-level quantization.
arXiv Detail & Related papers (2025-07-30T03:21:29Z)
MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation [74.34220141721231]
We present MPQ-DMv2, an improved textbfMixed textbfPrecision textbfQuantization framework for extremely low-bit textbfDiffusion textbfModels.
arXiv Detail & Related papers (2025-07-06T08:16:50Z)
FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMs [13.951330786310262]
FineQ is a software- hardware co-design for low-bit fine-grained mixed-precision quantization of large language models.<n>It partitions the weights into finer-grained clusters and considers the distribution of outliers within these clusters.<n>It achieves higher model accuracy compared to the SOTA mixed-precision quantization algorithm at a close average bit-width.
arXiv Detail & Related papers (2025-04-28T12:47:23Z)
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models [67.67135738642547]
Post-training quantization (PTQ) is a powerful compression technique investigated in large language models (LLMs) Existing PTQ methods are not ideal in terms of accuracy and efficiency, especially with below 4 bit-widths. This paper presents a Salience-Driven Mixed-Precision Quantization scheme for LLMs, namely SliM-LLM.
arXiv Detail & Related papers (2024-05-23T16:21:48Z)
Towards Accurate Post-training Quantization for Reparameterized Models [6.158896686945439]
Current Post-training Quantization (PTQ) methods often lead to significant accuracy degradation. This is primarily caused by channel-specific and sample-specific outliers. We propose RepAPQ, a novel framework that preserves the accuracy of quantized reparameterization models.
arXiv Detail & Related papers (2024-02-25T15:42:12Z)
CBQ: Cross-Block Quantization for Large Language Models [66.82132832702895]
Post-training quantization (PTQ) has played a key role in compressing large language models (LLMs) with ultra-low costs.<n>We propose CBQ, a cross-block reconstruction-based PTQ method for LLMs.<n> CBQ employs a cross-block dependency using a reconstruction scheme, establishing long-range dependencies across multiple blocks to minimize error accumulation.
arXiv Detail & Related papers (2023-12-13T07:56:27Z)
Mixed-Precision Quantization for Deep Vision Models with Integer Quadratic Programming [7.0146264551420066]
Quantization is a widely used technique to compress neural networks.<n>MPQ addresses this by assigning varied bit-widths to layers, optimizing the accuracy-efficiency trade-off.<n>We introduce CLADO, a practical sensitivity-based MPQ algorithm that captures crosslayer dependency of quantization error.
arXiv Detail & Related papers (2023-07-11T15:56:00Z)
Augmenting Hessians with Inter-Layer Dependencies for Mixed-Precision Post-Training Quantization [7.392278887917975]
We propose a mixed-precision post training quantization approach that assigns different numerical precisions to tensors in a network based on their specific needs. Our experiments demonstrate latency reductions compared to a 16-bit baseline of $25.48%$, $21.69%$, and $33.28%$ respectively.
arXiv Detail & Related papers (2023-06-08T02:18:58Z)
CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification [51.81850995661478]
Mixed-precision quantization has been widely applied on deep neural networks (DNNs) Previous attempts on bit-level regularization and pruning-based dynamic precision adjustment during training suffer from noisy gradients and unstable convergence. We propose Continuous Sparsification Quantization (CSQ), a bit-level training method to search for mixed-precision quantization schemes with improved stability.
arXiv Detail & Related papers (2022-12-06T05:44:21Z)
Collaborative Intelligent Reflecting Surface Networks with Multi-Agent Reinforcement Learning [63.83425382922157]
Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks. In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting.
arXiv Detail & Related papers (2022-03-26T20:37:14Z)
Sharpness-aware Quantization for Deep Neural Networks [45.150346855368]
Sharpness-Aware Quantization (SAQ) is a novel method to explore the effect of Sharpness-Aware Minimization (SAM) on model compression. We show that SAQ improves the generalization performance of the quantized models, yielding the SOTA results in uniform quantization.
arXiv Detail & Related papers (2021-11-24T05:16:41Z)
Fully Quantized Image Super-Resolution Networks [81.75002888152159]
We propose a Fully Quantized image Super-Resolution framework (FQSR) to jointly optimize efficiency and accuracy. We apply our quantization scheme on multiple mainstream super-resolution architectures, including SRResNet, SRGAN and EDSR. Our FQSR using low bits quantization can achieve on par performance compared with the full-precision counterparts on five benchmark datasets.
arXiv Detail & Related papers (2020-11-29T03:53:49Z)
AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation. Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.