Related papers: DynaQuant: Dynamic Mixed-Precision Quantization for Learned Image Compression

DynaQuant: Dynamic Mixed-Precision Quantization for Learned Image Compression

URL: http://arxiv.org/abs/2511.07903v1
Date: Wed, 12 Nov 2025 01:27:46 GMT
Title: DynaQuant: Dynamic Mixed-Precision Quantization for Learned Image Compression
Authors: Youneng Bao, Yulong Cheng, Yiping Liu, Yichen Yang, Peng Qin, Mu Li, Yongsheng Liang,
Abstract summary: DynaQuant is a novel framework for dynamic mixed-precision quantization.<n>We introduce a data-driven, dynamic bit-width selector that learns to assign an optimal bit precision to each layer.<n>Our fully dynamic approach offers substantial flexibility in balancing rate-distortion (R-D) performance and computational cost.
Score: 13.90943929367355
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Prevailing quantization techniques in Learned Image Compression (LIC) typically employ a static, uniform bit-width across all layers, failing to adapt to the highly diverse data distributions and sensitivity characteristics inherent in LIC models. This leads to a suboptimal trade-off between performance and efficiency. In this paper, we introduce DynaQuant, a novel framework for dynamic mixed-precision quantization that operates on two complementary levels. First, we propose content-aware quantization, where learnable scaling and offset parameters dynamically adapt to the statistical variations of latent features. This fine-grained adaptation is trained end-to-end using a novel Distance-aware Gradient Modulator (DGM), which provides a more informative learning signal than the standard Straight-Through Estimator. Second, we introduce a data-driven, dynamic bit-width selector that learns to assign an optimal bit precision to each layer, dynamically reconfiguring the network's precision profile based on the input data. Our fully dynamic approach offers substantial flexibility in balancing rate-distortion (R-D) performance and computational cost. Experiments demonstrate that DynaQuant achieves rd performance comparable to full-precision models while significantly reducing computational and storage requirements, thereby enabling the practical deployment of advanced LIC on diverse hardware platforms.

Related papers

LSGQuant: Layer-Sensitivity Guided Quantization for One-Step Diffusion Real-World Video Super-Resolution [52.627063566555194]
We introduce LSGQuant, a layer-sensitivity guided quantizing approach for one-step diffusion-based real-world VSR.<n>Our method incorporates a Dynamic Range Adaptive Quantizer (DRAQ) to fit video token activations.<n>Our method has nearly performance to origin model with full-precision and significantly exceeds existing quantization techniques.
arXiv Detail & Related papers (2026-02-03T06:53:19Z)
Content Adaptive based Motion Alignment Framework for Learned Video Compression [72.13599533975413]
This paper proposes a content adaptive based motion alignment framework.<n>We first introduce a two-stage flow-guided deformable warping mechanism that refines motion compensation with coarse-to-fine offset prediction and mask modulation.<n>Second, we propose a multi-reference quality aware strategy that adjusts distortion weights based on reference quality, and applies it to hierarchical training to reduce error propagation.<n>Third, we integrate a training-free module that downsamples frames by motion magnitude and resolution to obtain smooth motion estimation.
arXiv Detail & Related papers (2025-12-15T02:51:47Z)
FlexQuant: A Flexible and Efficient Dynamic Precision Switching Framework for LLM Quantization [19.12288373558071]
We propose FlexQuant, a dynamic precision-switching framework to optimize the trade-off between inference speed and accuracy.<n>We show that FlexQuant achieves a 1.3x end-to-end speedup across diverse language tasks with negligible accuracy loss.
arXiv Detail & Related papers (2025-05-21T07:42:53Z)
VRVVC: Variable-Rate NeRF-Based Volumetric Video Compression [59.14355576912495]
NeRF-based video has revolutionized visual media by delivering photorealistic Free-Viewpoint Video (FVV) experiences.<n>The substantial data volumes pose significant challenges for storage and transmission.<n>We propose VRVVC, a novel end-to-end joint variable-rate framework for video compression.
arXiv Detail & Related papers (2024-12-16T01:28:04Z)
Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference [4.093167352780157]
We introduce Logarithmic Posits (LP), an adaptive, hardware-friendly data type inspired by posits. We also develop a novel genetic-algorithm based framework, LP Quantization (LPQ), to find optimal layer-wise LP parameters.
arXiv Detail & Related papers (2024-03-08T17:28:49Z)
DyCE: Dynamically Configurable Exiting for Deep Learning Compression and Real-time Scaling [1.8350044465969415]
DyCE can adjust the performance-complexity trade-off of a deep learning model at runtime without requiring re-initialization or redeployment on inference hardware.<n>DyCE significantly reduces computational complexity by 23.5% for ResNet152 and 25.9% for ConvNextv2-tiny on ImageNet, with accuracy reductions of less than 0.5%.
arXiv Detail & Related papers (2024-03-04T03:09:28Z)
Dynamic Model Switching for Improved Accuracy in Machine Learning [0.0]
We introduce an adaptive ensemble that intuitively transitions between CatBoost and XGBoost. The user sets a benchmark, say 80% accuracy, prompting the system to dynamically shift to the new model only if it guarantees improved performance. This dynamic model-switching mechanism aligns with the evolving nature of data in real-world scenarios.
arXiv Detail & Related papers (2024-01-31T00:13:02Z)
Tunable Convolutions with Parametric Multi-Loss Optimization [5.658123802733283]
Behavior of neural networks is irremediably determined by the specific loss and data used during training. It is often desirable to tune the model at inference time based on external factors such as preferences of the user or dynamic characteristics of the data. This is especially important to balance the perception-distortion trade-off of ill-posed image-to-image translation tasks.
arXiv Detail & Related papers (2023-04-03T11:36:10Z)
Data Quality-aware Mixed-precision Quantization via Hybrid Reinforcement Learning [22.31766292657812]
Mixed-precision quantization mostly predetermines the model bit-width settings before actual training. We propose a novel Data Quality-aware Mixed-precision Quantization framework, dubbed DQMQ, to dynamically adapt quantization bit-widths to different data qualities.
arXiv Detail & Related papers (2023-02-09T06:14:00Z)
Modality-Agnostic Variational Compression of Implicit Neural Representations [96.35492043867104]
We introduce a modality-agnostic neural compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR) Bridging the gap between latent coding and sparsity, we obtain compact latent representations non-linearly mapped to a soft gating mechanism. After obtaining a dataset of such latent representations, we directly optimise the rate/distortion trade-off in a modality-agnostic space using neural compression.
arXiv Detail & Related papers (2023-01-23T15:22:42Z)
DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation [56.514462874501675]
We propose a dynamic sparse attention based Transformer model to achieve fine-level matching with favorable efficiency. The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on. Experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details.
arXiv Detail & Related papers (2022-07-13T11:12:03Z)
Dynamic Network-Assisted D2D-Aided Coded Distributed Learning [59.29409589861241]
We propose a novel device-to-device (D2D)-aided coded federated learning method (D2D-CFL) for load balancing across devices. We derive an optimal compression rate for achieving minimum processing time and establish its connection with the convergence time. Our proposed method is beneficial for real-time collaborative applications, where the users continuously generate training data.
arXiv Detail & Related papers (2021-11-26T18:44:59Z)
Online Meta Adaptation for Variable-Rate Learned Image Compression [40.8361915315201]
This work addresses two major issues of end-to-end learned image compression (LIC) based on deep neural networks. We introduce an online meta-learning (OML) setting for LIC, which combines ideas from meta learning and online learning in the conditional variational auto-encoder framework.
arXiv Detail & Related papers (2021-11-16T06:46:23Z)
End-to-End Facial Deep Learning Feature Compression with Teacher-Student Enhancement [57.18801093608717]
We propose a novel end-to-end feature compression scheme by leveraging the representation and learning capability of deep neural networks. In particular, the extracted features are compactly coded in an end-to-end manner by optimizing the rate-distortion cost. We verify the effectiveness of the proposed model with the facial feature, and experimental results reveal better compression performance in terms of rate-accuracy.
arXiv Detail & Related papers (2020-02-10T10:08:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.