Neural Networks with Quantization Constraints
- URL: http://arxiv.org/abs/2210.15623v1
- Date: Thu, 27 Oct 2022 17:12:48 GMT
- Title: Neural Networks with Quantization Constraints
- Authors: Ignacio Hounie, Juan Elenter, Alejandro Ribeiro
- Abstract summary: We present a constrained learning approach to quantization training.
We show that the resulting problem is strongly dual and does away with gradient estimations.
We demonstrate that the proposed approach exhibits competitive performance in image classification tasks.
- Score: 111.42313650830248
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Enabling low precision implementations of deep learning models, without
considerable performance degradation, is necessary in resource and latency
constrained settings. Moreover, exploiting the differences in sensitivity to
quantization across layers can allow mixed precision implementations to achieve
a considerably better computation performance trade-off. However,
backpropagating through the quantization operation requires introducing
gradient approximations, and choosing which layers to quantize is challenging
for modern architectures due to the large search space. In this work, we
present a constrained learning approach to quantization aware training. We
formulate low precision supervised learning as a constrained optimization
problem, and show that despite its non-convexity, the resulting problem is
strongly dual and does away with gradient estimations. Furthermore, we show
that dual variables indicate the sensitivity of the objective with respect to
constraint perturbations. We demonstrate that the proposed approach exhibits
competitive performance in image classification tasks, and leverage the
sensitivity result to apply layer selective quantization based on the value of
dual variables, leading to considerable performance improvements.
Related papers
- GAQAT: gradient-adaptive quantization-aware training for domain generalization [54.31450550793485]
We propose a novel Gradient-Adaptive Quantization-Aware Training (GAQAT) framework for DG.
Our approach begins by identifying the scale-gradient conflict problem in low-precision quantization.
Extensive experiments validate the effectiveness of the proposed GAQAT framework.
arXiv Detail & Related papers (2024-12-07T06:07:21Z) - Quantized and Interpretable Learning Scheme for Deep Neural Networks in Classification Task [0.0]
We introduce an approach that combines saliency-guided training with quantization techniques to create an interpretable and resource-efficient model.
Our results demonstrate that the combined use of saliency-guided training and PACT-based quantization not only maintains classification performance but also produces models that are significantly more efficient and interpretable.
arXiv Detail & Related papers (2024-12-05T06:34:06Z) - Saliency Assisted Quantization for Neural Networks [0.0]
This paper tackles the inherent black-box nature of deep learning models by providing real-time explanations during the training phase.
We employ established quantization techniques to address resource constraints.
To assess the effectiveness of our approach, we explore how quantization influences the interpretability and accuracy of Convolutional Neural Networks.
arXiv Detail & Related papers (2024-11-07T05:16:26Z) - Q-VLM: Post-training Quantization for Large Vision-Language Models [73.19871905102545]
We propose a post-training quantization framework of large vision-language models (LVLMs) for efficient multi-modal inference.
We mine the cross-layer dependency that significantly influences discretization errors of the entire vision-language model, and embed this dependency into optimal quantization strategy.
Experimental results demonstrate that our method compresses the memory by 2.78x and increase generate speed by 1.44x about 13B LLaVA model without performance degradation.
arXiv Detail & Related papers (2024-10-10T17:02:48Z) - QGait: Toward Accurate Quantization for Gait Recognition with Binarized Input [17.017127559393398]
We propose a differentiable soft quantizer, which better simulates the gradient of the round function during backpropagation.
This enables the network to learn from subtle input perturbations.
We further refine the training strategy to ensure convergence while simulating quantization errors.
arXiv Detail & Related papers (2024-05-22T17:34:18Z) - Augmenting Hessians with Inter-Layer Dependencies for Mixed-Precision
Post-Training Quantization [7.392278887917975]
We propose a mixed-precision post training quantization approach that assigns different numerical precisions to tensors in a network based on their specific needs.
Our experiments demonstrate latency reductions compared to a 16-bit baseline of $25.48%$, $21.69%$, and $33.28%$ respectively.
arXiv Detail & Related papers (2023-06-08T02:18:58Z) - DAQ: Distribution-Aware Quantization for Deep Image Super-Resolution
Networks [49.191062785007006]
Quantizing deep convolutional neural networks for image super-resolution substantially reduces their computational costs.
Existing works either suffer from a severe performance drop in ultra-low precision of 4 or lower bit-widths, or require a heavy fine-tuning process to recover the performance.
We propose a novel distribution-aware quantization scheme (DAQ) which facilitates accurate training-free quantization in ultra-low precision.
arXiv Detail & Related papers (2020-12-21T10:19:42Z) - Fully Quantized Image Super-Resolution Networks [81.75002888152159]
We propose a Fully Quantized image Super-Resolution framework (FQSR) to jointly optimize efficiency and accuracy.
We apply our quantization scheme on multiple mainstream super-resolution architectures, including SRResNet, SRGAN and EDSR.
Our FQSR using low bits quantization can achieve on par performance compared with the full-precision counterparts on five benchmark datasets.
arXiv Detail & Related papers (2020-11-29T03:53:49Z) - Optimal Gradient Quantization Condition for Communication-Efficient
Distributed Training [99.42912552638168]
Communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications.
In this work, we deduce the optimal condition of both the binary and multi-level gradient quantization for textbfANY gradient distribution.
Based on the optimal condition, we develop two novel quantization schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient quantization respectively.
arXiv Detail & Related papers (2020-02-25T18:28:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.