Related papers: Robust Residual Finite Scalar Quantization for Neural Compression

Robust Residual Finite Scalar Quantization for Neural Compression

URL: http://arxiv.org/abs/2508.15860v2
Date: Fri, 24 Oct 2025 09:26:03 GMT
Title: Robust Residual Finite Scalar Quantization for Neural Compression
Authors: Xiaoxu Zhu, Jiakui Li, Ken Zheng, Guiping Zhong, Huimeng Wang, Shiyin Kang, Dahua Lin,
Abstract summary: Finite Scalar Quantization (FSQ) offers simplified training but suffers from residual magnitude decay in multi-stage settings.<n>We propose Robust Residual Finite Scalar Quantization (RFSQ), addressing this fundamental limitation through two novel conditioning strategies.<n>Our experiments across audio and image modalities demonstrate RFSQ's effectiveness and generalizability.
Score: 46.574899938569125
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Finite Scalar Quantization (FSQ) offers simplified training but suffers from residual magnitude decay in multi-stage settings, where subsequent stages receive exponentially weaker signals. We propose Robust Residual Finite Scalar Quantization (RFSQ), addressing this fundamental limitation through two novel conditioning strategies: learnable scaling factors and invertible layer normalization. Our experiments across audio and image modalities demonstrate RFSQ's effectiveness and generalizability. In audio reconstruction at 24 bits/frame, RFSQ-LayerNorm achieves 3.646 DNSMOS, a 3.6% improvement over state-of-the-art RVQ (3.518). On ImageNet, RFSQ achieves 0.102 L1 loss and 0.100 perceptual loss, with LayerNorm providing 9.7% L1 improvement and 17.4% perceptual improvement over unconditioned variants. The LayerNorm strategy consistently outperforms alternatives by maintaining normalized input statistics across stages, effectively preventing exponential magnitude decay that limits naive residual approaches. RFSQ combines FSQ's simplicity with multi-stage quantization's representational power, establishing a new standard for neural compression across diverse modalities.

Related papers

BiRQA: Bidirectional Robust Quality Assessment for Images [49.74447451098852]
Full-Reference image quality assessment (FR IQA) is important for image compression, restoration and generative modeling.<n>We present BiRQA, a compact FR IQA metric model that processes four fast complementary features within a bidirectional multiscale pyramid.<n>On five public FR IQA benchmarks BiRQA outperforms or matches the previous state of the art (SOTA) while running 3x faster than previous SOTA models.
arXiv Detail & Related papers (2026-02-23T20:52:56Z)
Breaking the Blocks: Continuous Low-Rank Decomposed Scaling for Unified LLM Quantization and Adaptation [46.34608916687127]
Low-Rank Decomposed Scaling (LoRDS) is a unified framework that rethinks quantization granularity through this low-rank decomposition.<n>By "breaking the blocks" of spatial constraints, LoRDS establishes a seamless efficiency lifecycle.<n>LoRDS consistently outperforms state-of-the-art baselines across various model families in both quantization and downstream fine-tuning tasks.
arXiv Detail & Related papers (2026-01-30T08:46:02Z)
iFSQ: Improving FSQ for Image Generation with 1 Line of Code [40.61338660155903]
We show how to replace the activation function in FSQ with a distribution-matching mapping to enforce a uniform prior.<n>This simple strategy requires just one line of code yet mathematically guarantees both optimal bin utilization and reconstruction precision.<n>We extend our analysis by adapting Representation Alignment (REPA) to AR models, yielding LlamaGen-REPA.
arXiv Detail & Related papers (2026-01-23T19:00:35Z)
FLRQ: Faster LLM Quantization with Flexible Low-Rank Matrix Sketching [4.01326804806241]
We introduce Rank1-Sketch-based Flexible Rank Selection (R1-FLR) and Best Low-rank Approximation under Clipping (BLC)<n>R1-FLR applies the R1-Sketch with Gaussian projection for the fast low-rank approximation, enabling outlier-aware rank extraction for each layer.<n>BLC aims at minimizing the low-rank quantization error under the scaling and clipping strategy.
arXiv Detail & Related papers (2026-01-09T10:06:45Z)
MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation [74.34220141721231]
We present MPQ-DMv2, an improved textbfMixed textbfPrecision textbfQuantization framework for extremely low-bit textbfDiffusion textbfModels.
arXiv Detail & Related papers (2025-07-06T08:16:50Z)
FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation [55.12070409045766]
Post-training quantization (PTQ) has stood out as a cost-effective and promising model compression paradigm in recent years.<n>Current PTQ methods for Vision Transformers (ViTs) still suffer from significant accuracy degradation, especially under low-bit quantization.
arXiv Detail & Related papers (2025-06-13T07:57:38Z)
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization [0.0]
Layer-wise PTQ is a promising technique for compressing large language models (LLMs)<n>Recent progress in this area is saturating, underscoring the need to revisit its core limitations and explore further improvements.<n>We propose Quantization Error Propagation (QEP), a general, lightweight, and scalable framework that enhances layer-wise PTQ by explicitly propagating quantization errors and compensating for accumulated errors.
arXiv Detail & Related papers (2025-04-13T15:56:00Z)
Accelerating Feedback-Based Quantum Algorithms through Time Rescaling [0.0]
We introduce TR-FQA and TR-FALQON, time-rescaled versions of FQA and FALQON, respectively.<n>The results show that TR-FALQON accelerates convergence to the optimal solution in the early layers of the circuit.<n>In the context of state preparation, TR-FQA demonstrates superior convergence, reducing the required circuit depth by several hundred layers.
arXiv Detail & Related papers (2025-04-02T00:05:01Z)
End-to-End Implicit Neural Representations for Classification [57.55927378696826]
Implicit neural representations (INRs) encode a signal in neural network parameters and show excellent results for signal reconstruction.<n>INR-based classification still significantly under-performs compared to pixel-based methods like CNNs.<n>This work presents an end-to-end strategy for initializing SIRENs together with a learned learning-rate scheme.
arXiv Detail & Related papers (2025-03-23T16:02:23Z)
Stabilizing Quantization-Aware Training by Implicit-Regularization on Hessian Matrix [0.7261171488281837]
We find that the sharp landscape of loss, which leads to a dramatic performance drop, is an essential factor that causes instability.<n>We propose Feature-Perturbed Quantization (FPQ) to generalize and employ the feature distillation method to the quantized model.
arXiv Detail & Related papers (2025-03-14T07:56:20Z)
Signal Collapse in One-Shot Pruning: When Sparse Models Fail to Distinguish Neural Representations [2.209921757303168]
We show that mitigating signal collapse, rather than optimizing weight selection, is key to improving accuracy of pruned networks.<n>We propose REFLOW that addresses signal collapse without updating trainable weights.<n>We restore ResNeXt101 accuracy from under 4.1% to 78.9% on ImageNet with only 20% of the weights retained.
arXiv Detail & Related papers (2025-02-18T15:47:33Z)
CBQ: Cross-Block Quantization for Large Language Models [66.82132832702895]
Post-training quantization (PTQ) has played a key role in compressing large language models (LLMs) with ultra-low costs.<n>We propose CBQ, a cross-block reconstruction-based PTQ method for LLMs.<n> CBQ employs a cross-block dependency using a reconstruction scheme, establishing long-range dependencies across multiple blocks to minimize error accumulation.
arXiv Detail & Related papers (2023-12-13T07:56:27Z)
Distribution-Flexible Subset Quantization for Post-Quantizing Super-Resolution Networks [68.83451203841624]
This paper introduces Distribution-Flexible Subset Quantization (DFSQ), a post-training quantization method for super-resolution networks. DFSQ conducts channel-wise normalization of the activations and applies distribution-flexible subset quantization (SQ) It achieves comparable performance to full-precision counterparts on 6- and 8-bit quantization, and incurs only a 0.1 dB PSNR drop on 4-bit quantization.
arXiv Detail & Related papers (2023-05-10T04:19:11Z)
Effective Invertible Arbitrary Image Rescaling [77.46732646918936]
Invertible Neural Networks (INN) are able to increase upscaling accuracy significantly by optimizing the downscaling and upscaling cycle jointly. A simple and effective invertible arbitrary rescaling network (IARN) is proposed to achieve arbitrary image rescaling by training only one model in this work. It is shown to achieve a state-of-the-art (SOTA) performance in bidirectional arbitrary rescaling without compromising perceptual quality in LR outputs.
arXiv Detail & Related papers (2022-09-26T22:22:30Z)
CADyQ: Content-Aware Dynamic Quantization for Image Super-Resolution [55.50793823060282]
We propose a novel Content-Aware Dynamic Quantization (CADyQ) method for image super-resolution (SR) networks. CADyQ allocates optimal bits to local regions and layers adaptively based on the local contents of an input image. The pipeline has been tested on various SR networks and evaluated on several standard benchmarks.
arXiv Detail & Related papers (2022-07-21T07:50:50Z)
Sharpness-aware Quantization for Deep Neural Networks [45.150346855368]
Sharpness-Aware Quantization (SAQ) is a novel method to explore the effect of Sharpness-Aware Minimization (SAM) on model compression. We show that SAQ improves the generalization performance of the quantized models, yielding the SOTA results in uniform quantization.
arXiv Detail & Related papers (2021-11-24T05:16:41Z)
Fully Quantized Image Super-Resolution Networks [81.75002888152159]
We propose a Fully Quantized image Super-Resolution framework (FQSR) to jointly optimize efficiency and accuracy. We apply our quantization scheme on multiple mainstream super-resolution architectures, including SRResNet, SRGAN and EDSR. Our FQSR using low bits quantization can achieve on par performance compared with the full-precision counterparts on five benchmark datasets.
arXiv Detail & Related papers (2020-11-29T03:53:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.