Understanding Neural Network Binarization with Forward and Backward
Proximal Quantizers
- URL: http://arxiv.org/abs/2402.17710v1
- Date: Tue, 27 Feb 2024 17:43:51 GMT
- Title: Understanding Neural Network Binarization with Forward and Backward
Proximal Quantizers
- Authors: Yiwei Lu, Yaoliang Yu, Xinlin Li, Vahid Partovi Nia
- Abstract summary: In neural network binarization, BinaryConnect (BC) and its variants are considered the standard.
We aim at shedding some light on these training tricks from the optimization perspective.
- Score: 26.27829662433536
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In neural network binarization, BinaryConnect (BC) and its variants are
considered the standard. These methods apply the sign function in their forward
pass and their respective gradients are backpropagated to update the weights.
However, the derivative of the sign function is zero whenever defined, which
consequently freezes training. Therefore, implementations of BC (e.g., BNN)
usually replace the derivative of sign in the backward computation with
identity or other approximate gradient alternatives. Although such practice
works well empirically, it is largely a heuristic or ''training trick.'' We aim
at shedding some light on these training tricks from the optimization
perspective. Building from existing theory on ProxConnect (PC, a generalization
of BC), we (1) equip PC with different forward-backward quantizers and obtain
ProxConnect++ (PC++) that includes existing binarization techniques as special
cases; (2) derive a principled way to synthesize forward-backward quantizers
with automatic theoretical guarantees; (3) illustrate our theory by proposing
an enhanced binarization algorithm BNN++; (4) conduct image classification
experiments on CNNs and vision transformers, and empirically verify that BNN++
generally achieves competitive results on binarizing these models.
Related papers
- BiPer: Binary Neural Networks using a Periodic Function [17.461853355858022]
Quantized neural networks employ reduced precision representations for both weights and activations.
Binary Neural Networks (BNNs) are the extreme quantization case, representing values with just one bit.
In contrast to current BNN approaches, we propose to employ a binary periodic (BiPer) function during binarization.
arXiv Detail & Related papers (2024-04-01T17:52:17Z) - Projected Stochastic Gradient Descent with Quantum Annealed Binary Gradients [51.82488018573326]
We present QP-SBGD, a novel layer-wise optimiser tailored towards training neural networks with binary weights.
BNNs reduce the computational requirements and energy consumption of deep learning models with minimal loss in accuracy.
Our algorithm is implemented layer-wise, making it suitable to train larger networks on resource-limited quantum hardware.
arXiv Detail & Related papers (2023-10-23T17:32:38Z) - Compacting Binary Neural Networks by Sparse Kernel Selection [58.84313343190488]
This paper is motivated by a previously revealed phenomenon that the binary kernels in successful BNNs are nearly power-law distributed.
We develop the Permutation Straight-Through Estimator (PSTE) that is able to not only optimize the selection process end-to-end but also maintain the non-repetitive occupancy of selected codewords.
Experiments verify that our method reduces both the model size and bit-wise computational costs, and achieves accuracy improvements compared with state-of-the-art BNNs under comparable budgets.
arXiv Detail & Related papers (2023-03-25T13:53:02Z) - Network Binarization via Contrastive Learning [16.274341164897827]
We establish a novel contrastive learning framework while training Binary Neural Networks (BNNs)
MI is introduced as the metric to measure the information shared between binary and FP activations.
Results show that our method can be implemented as a pile-up module on existing state-of-the-art binarization methods.
arXiv Detail & Related papers (2022-07-06T21:04:53Z) - Bimodal Distributed Binarized Neural Networks [3.0778860202909657]
Binarization techniques, however, suffer from ineligible performance degradation compared to their full-precision counterparts.
We propose a Bi-Modal Distributed binarization method (methodname)
That imposes bi-modal distribution of the network weights by kurtosis regularization.
arXiv Detail & Related papers (2022-04-05T06:07:05Z) - A Bop and Beyond: A Second Order Optimizer for Binarized Neural Networks [0.0]
optimization of Binary Neural Networks (BNNs) relies on approximating the real-valued weights with their binarized representations.
In this paper, we take an approach parallel to Adam which also uses the second raw moment estimate to normalize the first raw moment before doing the comparison with the threshold.
We present two versions of the proposed: a biased one and a bias-corrected one, each with its own applications.
arXiv Detail & Related papers (2021-04-11T22:20:09Z) - SiMaN: Sign-to-Magnitude Network Binarization [165.5630656849309]
We show that our weight binarization provides an analytical solution by encoding high-magnitude weights into +1s, and 0s otherwise.
We prove that the learned weights of binarized networks roughly follow a Laplacian distribution that does not allow entropy.
Our method, dubbed sign-to- neural network binarization (SiMaN), is evaluated on CIFAR-10 and ImageNet.
arXiv Detail & Related papers (2021-02-16T07:03:51Z) - Training Binary Neural Networks through Learning with Noisy Supervision [76.26677550127656]
This paper formalizes the binarization operations over neural networks from a learning perspective.
Experimental results on benchmark datasets indicate that the proposed binarization technique attains consistent improvements over baselines.
arXiv Detail & Related papers (2020-10-10T01:59:39Z) - Rotated Binary Neural Network [138.89237044931937]
Binary Neural Network (BNN) shows its predominance in reducing the complexity of deep neural networks.
One of the major impediments is the large quantization error between the full-precision weight vector and its binary vector.
We introduce a Rotated Binary Neural Network (RBNN) which considers the angle alignment between the full-precision weight vector and its binarized version.
arXiv Detail & Related papers (2020-09-28T04:22:26Z) - Optimization and Generalization Analysis of Transduction through
Gradient Boosting and Application to Multi-scale Graph Neural Networks [60.22494363676747]
It is known that the current graph neural networks (GNNs) are difficult to make themselves deep due to the problem known as over-smoothing.
Multi-scale GNNs are a promising approach for mitigating the over-smoothing problem.
We derive the optimization and generalization guarantees of transductive learning algorithms that include multi-scale GNNs.
arXiv Detail & Related papers (2020-06-15T17:06:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.