Hyperspherical Loss-Aware Ternary Quantization
- URL: http://arxiv.org/abs/2212.12649v1
- Date: Sat, 24 Dec 2022 04:27:01 GMT
- Title: Hyperspherical Loss-Aware Ternary Quantization
- Authors: Dan Liu, Xue Liu
- Abstract summary: We show that our method can significantly improve the accuracy of ternary quantization in both image classification and object detection tasks.
The experimental results show that our method can significantly improve the accuracy of ternary quantization in both image classification and object detection tasks.
- Score: 12.90416661059601
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most of the existing works use projection functions for ternary quantization
in discrete space. Scaling factors and thresholds are used in some cases to
improve the model accuracy. However, the gradients used for optimization are
inaccurate and result in a notable accuracy gap between the full precision and
ternary models. To get more accurate gradients, some works gradually increase
the discrete portion of the full precision weights in the forward propagation
pass, e.g., using temperature-based Sigmoid function. Instead of directly
performing ternary quantization in discrete space, we push full precision
weights close to ternary ones through regularization term prior to ternary
quantization. In addition, inspired by the temperature-based method, we
introduce a re-scaling factor to obtain more accurate gradients by simulating
the derivatives of Sigmoid function. The experimental results show that our
method can significantly improve the accuracy of ternary quantization in both
image classification and object detection tasks.
Related papers
- Neural Precision Polarization: Simplifying Neural Network Inference with Dual-Level Precision [0.4124847249415279]
A floating-point model can be trained in the cloud and then downloaded to an edge device.
Network weights and activations are directly quantized to meet the edge devices' desired level, such as NF4 or INT8.
We show that neural precision polarization enables approximately 464 TOPS per Watt MAC efficiency and reliability.
arXiv Detail & Related papers (2024-11-06T16:02:55Z) - Diagonalisation SGD: Fast & Convergent SGD for Non-Differentiable Models
via Reparameterisation and Smoothing [1.6114012813668932]
We introduce a simple framework to define non-differentiable functions piecewisely and present a systematic approach to obtain smoothings.
Our main contribution is a novel variant of SGD, Diagonalisation Gradient Descent, which progressively enhances the accuracy of the smoothed approximation.
Our approach is simple, fast stable and attains orders of magnitude reduction in work-normalised variance.
arXiv Detail & Related papers (2024-02-19T00:43:22Z) - Neural Gradient Learning and Optimization for Oriented Point Normal
Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation.
We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors.
Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z) - Sampling from Gaussian Process Posteriors using Stochastic Gradient
Descent [43.097493761380186]
gradient algorithms are an efficient method of approximately solving linear systems.
We show that gradient descent produces accurate predictions, even in cases where it does not converge quickly to the optimum.
Experimentally, gradient descent achieves state-of-the-art performance on sufficiently large-scale or ill-conditioned regression tasks.
arXiv Detail & Related papers (2023-06-20T15:07:37Z) - Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z) - Sharp Calibrated Gaussian Processes [58.94710279601622]
State-of-the-art approaches for designing calibrated models rely on inflating the Gaussian process posterior variance.
We present a calibration approach that generates predictive quantiles using a computation inspired by the vanilla Gaussian process posterior variance.
Our approach is shown to yield a calibrated model under reasonable assumptions.
arXiv Detail & Related papers (2023-02-23T12:17:36Z) - Bosonic field digitization for quantum computers [62.997667081978825]
We address the representation of lattice bosonic fields in a discretized field amplitude basis.
We develop methods to predict error scaling and present efficient qubit implementation strategies.
arXiv Detail & Related papers (2021-08-24T15:30:04Z) - FFD: Fast Feature Detector [22.51804239092462]
We show that robust and accurate keypoints exist in the specific scale-space domain.
It is proved that setting the scale-space pyramid's smoothness ratio and blurring to 2 and 0.627, respectively, facilitates the detection of reliable keypoints.
arXiv Detail & Related papers (2020-12-01T21:56:35Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.