Neural Networks with Quantization Constraints
- URL: http://arxiv.org/abs/2210.15623v1
- Date: Thu, 27 Oct 2022 17:12:48 GMT
- Title: Neural Networks with Quantization Constraints
- Authors: Ignacio Hounie, Juan Elenter, Alejandro Ribeiro
- Abstract summary: We present a constrained learning approach to quantization training.
We show that the resulting problem is strongly dual and does away with gradient estimations.
We demonstrate that the proposed approach exhibits competitive performance in image classification tasks.
- Score: 111.42313650830248
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Enabling low precision implementations of deep learning models, without
considerable performance degradation, is necessary in resource and latency
constrained settings. Moreover, exploiting the differences in sensitivity to
quantization across layers can allow mixed precision implementations to achieve
a considerably better computation performance trade-off. However,
backpropagating through the quantization operation requires introducing
gradient approximations, and choosing which layers to quantize is challenging
for modern architectures due to the large search space. In this work, we
present a constrained learning approach to quantization aware training. We
formulate low precision supervised learning as a constrained optimization
problem, and show that despite its non-convexity, the resulting problem is
strongly dual and does away with gradient estimations. Furthermore, we show
that dual variables indicate the sensitivity of the objective with respect to
constraint perturbations. We demonstrate that the proposed approach exhibits
competitive performance in image classification tasks, and leverage the
sensitivity result to apply layer selective quantization based on the value of
dual variables, leading to considerable performance improvements.
Related papers
- Saliency Assisted Quantization for Neural Networks [0.0]
This paper tackles the inherent black-box nature of deep learning models by providing real-time explanations during the training phase.
We employ established quantization techniques to address resource constraints.
To assess the effectiveness of our approach, we explore how quantization influences the interpretability and accuracy of Convolutional Neural Networks.
arXiv Detail & Related papers (2024-11-07T05:16:26Z) - Q-VLM: Post-training Quantization for Large Vision-Language Models [73.19871905102545]
We propose a post-training quantization framework of large vision-language models (LVLMs) for efficient multi-modal inference.
We mine the cross-layer dependency that significantly influences discretization errors of the entire vision-language model, and embed this dependency into optimal quantization strategy.
Experimental results demonstrate that our method compresses the memory by 2.78x and increase generate speed by 1.44x about 13B LLaVA model without performance degradation.
arXiv Detail & Related papers (2024-10-10T17:02:48Z) - QGait: Toward Accurate Quantization for Gait Recognition with Binarized Input [17.017127559393398]
We propose a differentiable soft quantizer, which better simulates the gradient of the round function during backpropagation.
This enables the network to learn from subtle input perturbations.
We further refine the training strategy to ensure convergence while simulating quantization errors.
arXiv Detail & Related papers (2024-05-22T17:34:18Z) - Gradient constrained sharpness-aware prompt learning for vision-language
models [99.74832984957025]
This paper targets a novel trade-off problem in generalizable prompt learning for vision-language models (VLM)
By analyzing the loss landscapes of the state-of-the-art method and vanilla Sharpness-aware Minimization (SAM) based method, we conclude that the trade-off performance correlates to both loss value and loss sharpness.
We propose a novel SAM-based method for prompt learning, denoted as Gradient Constrained Sharpness-aware Context Optimization (GCSCoOp)
arXiv Detail & Related papers (2023-09-14T17:13:54Z) - Augmenting Hessians with Inter-Layer Dependencies for Mixed-Precision
Post-Training Quantization [7.392278887917975]
We propose a mixed-precision post training quantization approach that assigns different numerical precisions to tensors in a network based on their specific needs.
Our experiments demonstrate latency reductions compared to a 16-bit baseline of $25.48%$, $21.69%$, and $33.28%$ respectively.
arXiv Detail & Related papers (2023-06-08T02:18:58Z) - DAQ: Distribution-Aware Quantization for Deep Image Super-Resolution
Networks [49.191062785007006]
Quantizing deep convolutional neural networks for image super-resolution substantially reduces their computational costs.
Existing works either suffer from a severe performance drop in ultra-low precision of 4 or lower bit-widths, or require a heavy fine-tuning process to recover the performance.
We propose a novel distribution-aware quantization scheme (DAQ) which facilitates accurate training-free quantization in ultra-low precision.
arXiv Detail & Related papers (2020-12-21T10:19:42Z) - Fully Quantized Image Super-Resolution Networks [81.75002888152159]
We propose a Fully Quantized image Super-Resolution framework (FQSR) to jointly optimize efficiency and accuracy.
We apply our quantization scheme on multiple mainstream super-resolution architectures, including SRResNet, SRGAN and EDSR.
Our FQSR using low bits quantization can achieve on par performance compared with the full-precision counterparts on five benchmark datasets.
arXiv Detail & Related papers (2020-11-29T03:53:49Z) - Optimal Gradient Quantization Condition for Communication-Efficient
Distributed Training [99.42912552638168]
Communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications.
In this work, we deduce the optimal condition of both the binary and multi-level gradient quantization for textbfANY gradient distribution.
Based on the optimal condition, we develop two novel quantization schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient quantization respectively.
arXiv Detail & Related papers (2020-02-25T18:28:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.