Related papers: CoDeQ: End-to-End Joint Model Compression with Dead-Zone Quantizer for High-Sparsity and Low-Precision Networks

CoDeQ: End-to-End Joint Model Compression with Dead-Zone Quantizer for High-Sparsity and Low-Precision Networks

URL: http://arxiv.org/abs/2512.12981v1
Date: Mon, 15 Dec 2025 04:53:32 GMT
Title: CoDeQ: End-to-End Joint Model Compression with Dead-Zone Quantizer for High-Sparsity and Low-Precision Networks
Authors: Jonathan Wenshøj, Tong Chen, Bob Pepin, Raghavendra Selvan,
Abstract summary: CoDeQ is a simple, fully differentiable method for joint pruning--quantization.<n>We parameterize the dead-zone width and learn it via backpropagation, alongside the quantization parameters.<n>On ImageNet with ResNet-18, CoDeQ reduces bit operations to 5% while maintaining close to full precision accuracy.
Score: 9.784730417146335
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While joint pruning--quantization is theoretically superior to sequential application, current joint methods rely on auxiliary procedures outside the training loop for finding compression parameters. This reliance adds engineering complexity and hyperparameter tuning, while also lacking a direct data-driven gradient signal, which might result in sub-optimal compression. In this paper, we introduce CoDeQ, a simple, fully differentiable method for joint pruning--quantization. Our approach builds on a key observation: the dead-zone of a scalar quantizer is equivalent to magnitude pruning, and can be used to induce sparsity directly within the quantization operator. Concretely, we parameterize the dead-zone width and learn it via backpropagation, alongside the quantization parameters. This design provides explicit control of sparsity, regularized by a single global hyperparameter, while decoupling sparsity selection from bit-width selection. The result is a method for Compression with Dead-zone Quantizer (CoDeQ) that supports both fixed-precision and mixed-precision quantization (controlled by an optional second hyperparameter). It simultaneously determines the sparsity pattern and quantization parameters in a single end-to-end optimization. Consequently, CoDeQ does not require any auxiliary procedures, making the method architecture-agnostic and straightforward to implement. On ImageNet with ResNet-18, CoDeQ reduces bit operations to ~5% while maintaining close to full precision accuracy in both fixed and mixed-precision regimes.

Related papers

pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training [24.05577787968274]
pQuant is a method that decouples parameters by splitting linear layers into two specialized branches.<n>We show pQuant achieves state-of-the-art performance in extremely low-bit quantization.
arXiv Detail & Related papers (2026-02-26T03:51:58Z)
MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation [74.34220141721231]
We present MPQ-DMv2, an improved textbfMixed textbfPrecision textbfQuantization framework for extremely low-bit textbfDiffusion textbfModels.
arXiv Detail & Related papers (2025-07-06T08:16:50Z)
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models [63.118592279833656]
Post-training quantization (PTQ) is an effective technique for compressing large language models (LLMs)<n>We propose SliM-LLM, a salience-driven mixed-precision quantization framework that allocates bit-widths at the group-wise.<n> Experiments show that SliM-LLM achieves superior performance across various LLMs at low bit-widths.
arXiv Detail & Related papers (2024-05-23T16:21:48Z)
Towards Accurate Post-training Quantization for Reparameterized Models [6.158896686945439]
Current Post-training Quantization (PTQ) methods often lead to significant accuracy degradation. This is primarily caused by channel-specific and sample-specific outliers. We propose RepAPQ, a novel framework that preserves the accuracy of quantized reparameterization models.
arXiv Detail & Related papers (2024-02-25T15:42:12Z)
Gradient-descent quantum process tomography by learning Kraus operators [63.69764116066747]
We perform quantum process tomography (QPT) for both discrete- and continuous-variable quantum systems. We use a constrained gradient-descent (GD) approach on the so-called Stiefel manifold during optimization to obtain the Kraus operators. The GD-QPT matches the performance of both compressed-sensing (CS) and projected least-squares (PLS) QPT in benchmarks with two-qubit random processes.
arXiv Detail & Related papers (2022-08-01T12:48:48Z)
OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization [32.60139548889592]
We propose a novel One-shot Pruning-Quantization (OPQ) in this paper. OPQ analytically solves the compression allocation with pre-trained weight parameters only. We propose a unified channel-wise quantization method that enforces all channels of each layer to share a common codebook.
arXiv Detail & Related papers (2022-05-23T09:05:25Z)
Data post-processing for the one-way heterodyne protocol under composable finite-size security [62.997667081978825]
We study the performance of a practical continuous-variable (CV) quantum key distribution protocol. We focus on the Gaussian-modulated coherent-state protocol with heterodyne detection in a high signal-to-noise ratio regime. This allows us to study the performance for practical implementations of the protocol and optimize the parameters connected to the steps above.
arXiv Detail & Related papers (2022-05-20T12:37:09Z)
Automatic Mixed-Precision Quantization Search of BERT [62.65905462141319]
Pre-trained language models such as BERT have shown remarkable effectiveness in various natural language processing tasks. These models usually contain millions of parameters, which prevents them from practical deployment on resource-constrained devices. We propose an automatic mixed-precision quantization framework designed for BERT that can simultaneously conduct quantization and pruning in a subgroup-wise level.
arXiv Detail & Related papers (2021-12-30T06:32:47Z)
Composably secure data processing for Gaussian-modulated continuous variable quantum key distribution [58.720142291102135]
Continuous-variable quantum key distribution (QKD) employs the quadratures of a bosonic mode to establish a secret key between two remote parties. We consider a protocol with homodyne detection in the general setting of composable finite-size security. In particular, we analyze the high signal-to-noise regime which requires the use of high-rate (non-binary) low-density parity check codes.
arXiv Detail & Related papers (2021-03-30T18:02:55Z)
BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization [32.770842274996774]
Mixed-precision quantization can potentially achieve the optimal tradeoff between performance and compression rate of deep neural networks. Previous methods either examine only a small manually-designed search space or utilize a cumbersome neural architecture search to explore the vast search space. This work proposes bit-level sparsity quantization (BSQ) to tackle the mixed-precision quantization from a new angle of inducing bit-level sparsity.
arXiv Detail & Related papers (2021-02-20T22:37:41Z)
DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution [136.7261709896713]
We propose a data-driven approach that generates the appropriate convolution kernels to apply in response to the nature of the instances. The proposed method achieves promising results on both ScanetNetV2 and S3DIS. It also improves inference speed by more than 25% over the current state-of-the-art.
arXiv Detail & Related papers (2020-11-26T14:56:57Z)
Automated Model Compression by Jointly Applied Pruning and Quantization [14.824593320721407]
In the traditional deep compression framework, iteratively performing network pruning and quantization can reduce the model size and computation cost. We tackle this issue by integrating network pruning and quantization as a unified joint compression problem and then use AutoML to automatically solve it. We propose the automated model compression by jointly applied pruning and quantization (AJPQ)
arXiv Detail & Related papers (2020-11-12T07:06:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.