Related papers: GABFusion: Rethinking Feature Fusion for Low-Bit Quantization of Multi-Task Networks

GABFusion: Rethinking Feature Fusion for Low-Bit Quantization of Multi-Task Networks

URL: http://arxiv.org/abs/2511.05898v1
Date: Sat, 08 Nov 2025 07:45:21 GMT
Title: GABFusion: Rethinking Feature Fusion for Low-Bit Quantization of Multi-Task Networks
Authors: Zhaoyang Wang, Dong Wang,
Abstract summary: We propose Gradient-Aware Balanced Feature Fusion (GABFusion), which balances gradient magnitudes and fuses task-specific features in a quantization-friendly manner.<n>Our strategy consistently enhances a variety of QAT methods across different network architectures and bit-widths.<n> Notably, the proposed framework is modular, easy to integrate, and compatible with any existing QAT technique-enhancing the performance of quantized models.
Score: 7.087257323517682
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the effectiveness of quantization-aware training (QAT) in compressing deep neural networks, its performance on multi-task architectures often degrades significantly due to task-specific feature discrepancies and gradient conflicts. To address these challenges, we propose Gradient-Aware Balanced Feature Fusion (GABFusion), which dynamically balances gradient magnitudes and fuses task-specific features in a quantization-friendly manner. We further introduce Attention Distribution Alignment (ADA), a feature-level distillation strategy tailored for quantized models. Our method demonstrates strong generalization across network architectures and QAT algorithms, with theoretical guarantees on gradient bias reduction. Extensive experiments demonstrate that our strategy consistently enhances a variety of QAT methods across different network architectures and bit-widths. On PASCAL VOC and COCO datasets, the proposed approach achieves average mAP improvements of approximately 3.3% and 1.6%, respectively. When applied to YOLOv5 under 4-bit quantization, our method narrows the accuracy gap with the full-precision model to only 1.7% on VOC, showcasing its effectiveness in preserving performance under low-bit constraints. Notably, the proposed framework is modular, easy to integrate, and compatible with any existing QAT technique-enhancing the performance of quantized models without requiring modifications to the original network architecture.

Related papers

Adaptive Distribution-aware Quantization for Mixed-Precision Neural Networks [12.36496914117844]
Quantization-Aware Training (QAT) is a critical technique for deploying deep neural networks on resource-constrained devices.<n>We propose Adaptive Distribution-aware Quantization (ADQ), a mixed-precision quantization framework that employs a differentiated strategy.
arXiv Detail & Related papers (2025-10-22T16:48:29Z)
Progressive Element-wise Gradient Estimation for Neural Network Quantization [2.1413624861650358]
Quantization-Aware Training (QAT) methods rely on the Straight-Through Estimator (STE) to address the non-differentiability of discretization functions.<n>We propose Progressive Element-wise Gradient Estimation (PEGE) to address discretization errors between continuous and quantized values.<n>PEGE consistently outperforms existing backpropagation methods and enables low-precision models to match or even outperform the accuracy of their full-precision counterparts.
arXiv Detail & Related papers (2025-08-27T15:59:36Z)
Precision Neural Network Quantization via Learnable Adaptive Modules [27.323901068182234]
Quantization Aware Training (QAT) is a neural network quantization technique that compresses model size and improves operational efficiency.<n>We propose an effective learnable adaptive neural network quantization method, called Adaptive Step Size Quantization (ASQ)
arXiv Detail & Related papers (2025-04-24T05:46:25Z)
Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression [44.35542987414442]
Structured pruning and quantization are fundamental techniques used to reduce the size of deep neural networks (DNNs)<n>Applying these techniques jointly via co-optimization has the potential to produce smaller, high-quality models.<n>We present the framework GETA, which automatically and efficiently performs joint structured pruning and quantization-aware training on any DNNs.
arXiv Detail & Related papers (2025-02-23T16:28:18Z)
GAQAT: gradient-adaptive quantization-aware training for domain generalization [54.31450550793485]
We propose a novel Gradient-Adaptive Quantization-Aware Training (GAQAT) framework for DG.<n>Our approach begins by identifying the scale-gradient conflict problem in low-precision quantization.<n>Extensive experiments validate the effectiveness of the proposed GAQAT framework.
arXiv Detail & Related papers (2024-12-07T06:07:21Z)
QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning [52.157939524815866]
In this paper, we identify imbalanced activation distributions as a primary source of quantization difficulty.<n>We propose to adjust these distributions through weight finetuning to be more quantization-friendly.<n>Our method demonstrates its efficacy across three high-resolution image generation tasks.
arXiv Detail & Related papers (2024-02-06T03:39:44Z)
Quantize Once, Train Fast: Allreduce-Compatible Compression with Provable Guarantees [53.950234267704]
We introduce Global-QSGD, an All-reduce gradient-compatible quantization method.<n>We show that it accelerates distributed training by up to 3.51% over baseline quantization methods.
arXiv Detail & Related papers (2023-05-29T21:32:15Z)
Fully Quantized Image Super-Resolution Networks [81.75002888152159]
We propose a Fully Quantized image Super-Resolution framework (FQSR) to jointly optimize efficiency and accuracy. We apply our quantization scheme on multiple mainstream super-resolution architectures, including SRResNet, SRGAN and EDSR. Our FQSR using low bits quantization can achieve on par performance compared with the full-precision counterparts on five benchmark datasets.
arXiv Detail & Related papers (2020-11-29T03:53:49Z)
A Statistical Framework for Low-bitwidth Training of Deep Neural Networks [70.77754244060384]
Fully quantized training (FQT) uses low-bitwidth hardware by quantizing the activations, weights, and gradients of a neural network model. One major challenge with FQT is the lack of theoretical understanding, in particular of how gradient quantization impacts convergence properties.
arXiv Detail & Related papers (2020-10-27T13:57:33Z)
Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search [112.05977301976613]
We propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides. We first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models. Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and improves the quantization accuracy.
arXiv Detail & Related papers (2020-10-09T03:52:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.