Related papers: AHCPTQ: Accurate and Hardware-Compatible Post-Training Quantization for Segment Anything Model

AHCPTQ: Accurate and Hardware-Compatible Post-Training Quantization for Segment Anything Model

URL: http://arxiv.org/abs/2503.03088v1
Date: Wed, 05 Mar 2025 01:04:45 GMT
Title: AHCPTQ: Accurate and Hardware-Compatible Post-Training Quantization for Segment Anything Model
Authors: Wenlun Zhang, Shimpei Ando, Kentaro Yoshioka,
Abstract summary: The Segment Anything Model (SAM) has demonstrated strong versatility across various visual tasks.<n>Post-training quantization (PTQ) has emerged as an effective strategy for efficient deployment.<n>We propose AHCPTQ, an accurate and hardware-efficient PTQ method for SAM.
Score: 0.6827423171182154
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Segment Anything Model (SAM) has demonstrated strong versatility across various visual tasks. However, its large storage requirements and high computational cost pose challenges for practical deployment. Post-training quantization (PTQ) has emerged as an effective strategy for efficient deployment, but we identify two key challenges in SAM that hinder the effectiveness of existing PTQ methods: the heavy-tailed and skewed distribution of post-GELU activations, and significant inter-channel variation in linear projection activations. To address these challenges, we propose AHCPTQ, an accurate and hardware-efficient PTQ method for SAM. AHCPTQ introduces hardware-compatible Hybrid Log-Uniform Quantization (HLUQ) to manage post-GELU activations, employing log2 quantization for dense small values and uniform quantization for sparse large values to enhance quantization resolution. Additionally, AHCPTQ incorporates Channel-Aware Grouping (CAG) to mitigate inter-channel variation by progressively clustering activation channels with similar distributions, enabling them to share quantization parameters and improving hardware efficiency. The combination of HLUQ and CAG not only enhances quantization effectiveness but also ensures compatibility with efficient hardware execution. For instance, under the W4A4 configuration on the SAM-L model, AHCPTQ achieves 36.6% mAP on instance segmentation with the DINO detector, while achieving a 7.89x speedup and 8.64x energy efficiency over its floating-point counterpart in FPGA implementation.

Related papers

SegQuant: A Semantics-Aware and Generalizable Quantization Framework for Diffusion Models [12.716956318428652]
SegQuant is a unified quantization framework that adaptively combines complementary techniques to enhance cross-model versatility.<n>SegQuant is broadly applicable beyond Transformer-based diffusion models, achieving strong performance while ensuring seamless compatibility with mainstream deployment tools.
arXiv Detail & Related papers (2025-07-20T04:00:53Z)
Mix-QSAM: Mixed-Precision Quantization of the Segment Anything Model [0.0]
Mix-QSAM is a mixed-precision Post-Training Quantization (PTQ) framework for the Segment Anything Model (SAM)<n>We introduce a layer-wise importance score, derived using Kullback-Leibler (KL) divergence, to quantify each layer's contribution to the model's output.<n>We also introduce cross-layer synergy, a novel metric based on causal mutual information, to capture dependencies between adjacent layers.
arXiv Detail & Related papers (2025-05-08T00:08:31Z)
Channel-Wise Mixed-Precision Quantization for Large Language Models [47.00361921910259]
Large Language Models (LLMs) have demonstrated remarkable success across a wide range of language tasks.<n>Weight-only quantization presents a promising solution to reduce the memory footprint of LLMs.<n>We introduce Channel-Wise Mixed-Precision Quantization (CMPQ), a novel mixed-precision quantization method.
arXiv Detail & Related papers (2024-10-16T21:34:41Z)
AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer [54.713778961605115]
Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks in the computer vision community. We propose a novel non-uniform quantizer, dubbed the Adaptive Logarithm AdaLog (AdaLog) quantizer.
arXiv Detail & Related papers (2024-07-17T18:38:48Z)
OutlierTune: Efficient Channel-Wise Quantization for Large Language Models [24.645237670811476]
OutlierTune is an efficient per-channel post-training quantization method for the activations of large language models. The proposed framework is easy to implement and hardware-efficient, introducing almost no additional computational overheads during the inference.
arXiv Detail & Related papers (2024-06-27T02:02:26Z)
PTQ4DiT: Post-training Quantization for Diffusion Transformers [52.902071948957186]
Post-training Quantization (PTQ) has emerged as a fast and data-efficient solution that can significantly reduce computation and memory footprint. We propose PTQ4DiT, a specifically designed PTQ method for DiTs. We demonstrate that our PTQ4DiT successfully quantizes DiTs to 8-bit precision while preserving comparable generation ability.
arXiv Detail & Related papers (2024-05-25T02:02:08Z)
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models [63.118592279833656]
Post-training quantization (PTQ) is an effective technique for compressing large language models (LLMs)<n>We propose SliM-LLM, a salience-driven mixed-precision quantization framework that allocates bit-widths at the group-wise.<n> Experiments show that SliM-LLM achieves superior performance across various LLMs at low bit-widths.
arXiv Detail & Related papers (2024-05-23T16:21:48Z)
Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers [10.566264033360282]
Post-training quantization (PTQ) has emerged as a promising solution for deploying hyper-scale models on edge devices such as mobile and TVs. In this paper, we propose a novel PTQ algorithm that balances accuracy and efficiency.
arXiv Detail & Related papers (2024-02-14T05:58:43Z)
CBQ: Cross-Block Quantization for Large Language Models [66.82132832702895]
Post-training quantization (PTQ) has played a key role in compressing large language models (LLMs) with ultra-low costs.<n>We propose CBQ, a cross-block reconstruction-based PTQ method for LLMs.<n> CBQ employs a cross-block dependency using a reconstruction scheme, establishing long-range dependencies across multiple blocks to minimize error accumulation.
arXiv Detail & Related papers (2023-12-13T07:56:27Z)
Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization [12.655230451207956]
This paper focuses on post-training quantization (PTQ) in Large Language Models (LLMs) We present two innovative techniques: activation-quantization-aware scaling (AQAS) and sequence-length-aware calibration (SLAC) We demonstrate that our techniques significantly boost task accuracies to levels comparable with full-precision models.
arXiv Detail & Related papers (2023-11-09T06:19:51Z)
Distribution-Flexible Subset Quantization for Post-Quantizing Super-Resolution Networks [68.83451203841624]
This paper introduces Distribution-Flexible Subset Quantization (DFSQ), a post-training quantization method for super-resolution networks. DFSQ conducts channel-wise normalization of the activations and applies distribution-flexible subset quantization (SQ) It achieves comparable performance to full-precision counterparts on 6- and 8-bit quantization, and incurs only a 0.1 dB PSNR drop on 4-bit quantization.
arXiv Detail & Related papers (2023-05-10T04:19:11Z)
Parameter-Parallel Distributed Variational Quantum Algorithm [7.255056332088222]
Variational quantum algorithms (VQAs) have emerged as a promising near-term technique to explore practical quantum advantage on noisy devices. Here, we propose a parameter-parallel distributed variational quantum algorithm (PPD-VQA) to accelerate the training process by parameter-parallel training with multiple quantum processors. The achieved results suggest that the PPD-VQA could provide a practical solution for coordinating multiple quantum processors to handle large-scale real-word applications.
arXiv Detail & Related papers (2022-07-31T15:09:12Z)
Quantum circuit architecture search for variational quantum algorithms [88.71725630554758]
We propose a resource and runtime efficient scheme termed quantum architecture search (QAS) QAS automatically seeks a near-optimal ansatz to balance benefits and side-effects brought by adding more noisy quantum gates. We implement QAS on both the numerical simulator and real quantum hardware, via the IBM cloud, to accomplish data classification and quantum chemistry tasks.
arXiv Detail & Related papers (2020-10-20T12:06:27Z)
Q-GADMM: Quantized Group ADMM for Communication Efficient Decentralized Machine Learning [66.18202188565922]
We propose a communication-efficient decentralized machine learning (ML) algorithm, coined QGADMM (QGADMM) We develop a novel quantization method to adaptively adjust modelization levels and their probabilities, while proving the convergence of QGADMM for convex functions.
arXiv Detail & Related papers (2019-10-23T10:47:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.