Related papers: Exact Backpropagation in Binary Weighted Networks with Group Weight Transformations

Exact Backpropagation in Binary Weighted Networks with Group Weight Transformations

URL: http://arxiv.org/abs/2107.01400v1
Date: Sat, 3 Jul 2021 10:29:34 GMT
Title: Exact Backpropagation in Binary Weighted Networks with Group Weight Transformations
Authors: Yaniv Shulman
Abstract summary: Quantization based model compression serves as high performing and fast approach for inference. Models that constrain the weights to binary values enable efficient implementation of the ubiquitous dot product.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Quantization based model compression serves as high performing and fast approach for inference that yields highly compressed models compared to their full-precision floating point counterparts. The most extreme quantization is a 1-bit representation of parameters such that they have only two possible values, typically -1(0) or +1. Models that constrain the weights to binary values enable efficient implementation of the ubiquitous dot product by additions only without requiring floating point multiplications which is beneficial for resources constrained inference. The main contribution of this work is the introduction of a method to smooth the combinatorial problem of determining a binary vector of weights to minimize the expected loss for a given objective by means of empirical risk minimization with backpropagation. This is achieved by approximating a multivariate binary state over the weights utilizing a deterministic and differentiable transformation of real-valued continuous parameters. The proposed method adds little overhead in training, can be readily applied without any substantial modifications to the original architecture, does not introduce additional saturating non-linearities or auxiliary losses, and does not prohibit applying other methods for binarizing the activations. It is demonstrated that contrary to common assertions made in the literature, binary weighted networks can train well with the same standard optimization techniques and similar hyperparameters settings as their full-precision counterparts, namely momentum SGD with large learning rates and $L_2$ regularization. The source code is publicly available at https://bitbucket.org/YanivShu/binary_weighted_networks_public

Related papers

Effective Interplay between Sparsity and Quantization: From Theory to Practice [33.697590845745815]
We show how sparsity and quantization interact when combined together. We show that even if applied in the correct order, the compounded errors from sparsity and quantization can significantly harm accuracy. Our findings extend to the efficient deployment of large models in resource-constrained compute platforms.
arXiv Detail & Related papers (2024-05-31T15:34:13Z)
NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search [7.971065005161565]
quantization is a technique to convert floating point representations to low bit-width fixed point representations. We show how to learn new quantized weights over the entire quantized space. We show the ability of the method to achieve state-of-the-art compression rates in both, data-free and data-driven configurations.
arXiv Detail & Related papers (2023-08-10T14:19:58Z)
SqueezeLLM: Dense-and-Sparse Quantization [80.32162537942138]
Main bottleneck for generative inference with LLMs is memory bandwidth, rather than compute, for single batch inference. We introduce SqueezeLLM, a post-training quantization framework that enables lossless compression to ultra-low precisions of up to 3-bit. Our framework incorporates two novel ideas: (i) sensitivity-based non-uniform quantization, which searches for the optimal bit precision assignment based on second-order information; and (ii) the Dense-and-Sparse decomposition that stores outliers and sensitive weight values in an efficient sparse format.
arXiv Detail & Related papers (2023-06-13T08:57:54Z)
AdaBin: Improving Binary Neural Networks with Adaptive Binary Sets [27.022212653067367]
This paper studies the Binary Neural Networks (BNNs) in which weights and activations are both binarized into 1-bit values. We present a simple yet effective approach called AdaBin to adaptively obtain the optimal binary sets. Experimental results on benchmark models and datasets demonstrate that the proposed AdaBin is able to achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-08-17T05:43:33Z)
Monarch: Expressive Structured Matrices for Efficient and Accurate Training [64.6871423399431]
Large neural networks excel in many domains, but they are expensive to train and fine-tune. A popular approach to reduce their compute or memory requirements is to replace dense weight matrices with structured ones. We propose a class of matrices (Monarch) that is hardware-efficient.
arXiv Detail & Related papers (2022-04-01T17:37:29Z)
Bias-Variance Tradeoffs in Single-Sample Binary Gradient Estimators [100.58924375509659]
Straight-through (ST) estimator gained popularity due to its simplicity and efficiency. Several techniques were proposed to improve over ST while keeping the same low computational complexity. We conduct a theoretical analysis of Bias and Variance of these methods in order to understand tradeoffs and verify originally claimed properties.
arXiv Detail & Related papers (2021-10-07T15:16:07Z)
Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption. They can suffer from ill-posedness and convergence instability. This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z)
Binarized Weight Error Networks With a Transition Regularization Term [4.56877715768796]
This paper proposes a novel binarized weight network (BT) for a resource-efficient neural structure. The proposed model estimates a binary representation of weights by taking into account the approximation error with an additional term. A novel regularization term is introduced that is suitable for all threshold-based binary precision networks.
arXiv Detail & Related papers (2021-05-09T10:11:26Z)
Preprint: Norm Loss: An efficient yet effective regularization method for deep neural networks [7.214681039134488]
We propose a weight soft-regularization method based on the oblique manifold. We evaluate our method on the popular CIFAR-10, CIFAR-100 and ImageNet 2012 datasets.
arXiv Detail & Related papers (2021-03-11T10:24:49Z)
QuantNet: Learning to Quantize by Learning within Fully Differentiable Framework [32.465949985191635]
This paper proposes a meta-based quantizer named QuantNet, which utilizes a differentiable sub-network to directly binarize the full-precision weights. Our method not only solves the problem of gradient mismatching, but also reduces the impact of discretization errors, caused by the binarizing operation in the deployment.
arXiv Detail & Related papers (2020-09-10T01:41:05Z)
Understanding Implicit Regularization in Over-Parameterized Single Index Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model. We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z)
Multi-Objective Matrix Normalization for Fine-grained Visual Recognition [153.49014114484424]
Bilinear pooling achieves great success in fine-grained visual recognition (FGVC) Recent methods have shown that the matrix power normalization can stabilize the second-order information in bilinear features. We propose an efficient Multi-Objective Matrix Normalization (MOMN) method that can simultaneously normalize a bilinear representation.
arXiv Detail & Related papers (2020-03-30T08:40:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.