Binarizing by Classification: Is soft function really necessary?
- URL: http://arxiv.org/abs/2205.07433v3
- Date: Sun, 16 Jul 2023 07:22:19 GMT
- Title: Binarizing by Classification: Is soft function really necessary?
- Authors: Yefei He, Luoming Zhang, Weijia Wu, Hong Zhou
- Abstract summary: We propose to tackle network binarization as a binary classification problem.
We also take binarization as a lightweighting approach for pose estimation models.
The proposed method enables binary networks to achieve a mAP of up to $60.6$ for the first time.
- Score: 4.329951775163721
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Binary neural networks leverage $\mathrm{Sign}$ function to binarize weights
and activations, which require gradient estimators to overcome its
non-differentiability and will inevitably bring gradient errors during
backpropagation. Although many hand-designed soft functions have been proposed
as gradient estimators to better approximate gradients, their mechanism is not
clear and there are still huge performance gaps between binary models and their
full-precision counterparts. To address these issues and reduce gradient error,
we propose to tackle network binarization as a binary classification problem
and use a multi-layer perceptron (MLP) as the classifier in the forward pass
and gradient estimator in the backward pass. Benefiting from the MLP's
theoretical capability to fit any continuous function, it can be adaptively
learned to binarize networks and backpropagate gradients without any prior
knowledge of soft functions. From this perspective, we further empirically
justify that even a simple linear function can outperform previous complex soft
functions. Extensive experiments demonstrate that the proposed method yields
surprising performance both in image classification and human pose estimation
tasks. Specifically, we achieve $65.7\%$ top-1 accuracy of ResNet-34 on
ImageNet dataset, with an absolute improvement of $2.6\%$. Moreover, we take
binarization as a lightweighting approach for pose estimation models and
propose well-designed binary pose estimation networks SBPN and BHRNet. When
evaluating on the challenging Microsoft COCO keypoint dataset, the proposed
method enables binary networks to achieve a mAP of up to $60.6$ for the first
time. Experiments conducted on real platforms demonstrate that BNN achieves a
better balance between performance and computational complexity, especially
when computational resources are extremely low.
Related papers
- Training Multi-Layer Binary Neural Networks With Local Binary Error Signals [3.7740044597960316]
We introduce a multi-layer training algorithm for Binary Neural Networks (BNNs) that does not require the computation of back-propagated full-precision gradients.
The proposed algorithm is based on local binary error signals and binary weight updates, employing integer-valued hidden weights that serve as a synaptic metaplasticity mechanism.
Experimental results on BMLPs fully trained in a binary-native and gradient-free manner on multi-class image classification benchmarks demonstrate an accuracy improvement of up to +13.36%.
arXiv Detail & Related papers (2024-11-28T09:12:04Z) - Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
We present Layer-wise Feedback Propagation (LFP), a novel training principle for neural network-like predictors.
LFP decomposes a reward to individual neurons based on their respective contributions to solving a given task.
Our method then implements a greedy approach reinforcing helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - AdaBin: Improving Binary Neural Networks with Adaptive Binary Sets [27.022212653067367]
This paper studies the Binary Neural Networks (BNNs) in which weights and activations are both binarized into 1-bit values.
We present a simple yet effective approach called AdaBin to adaptively obtain the optimal binary sets.
Experimental results on benchmark models and datasets demonstrate that the proposed AdaBin is able to achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-08-17T05:43:33Z) - Adaptive Self-supervision Algorithms for Physics-informed Neural
Networks [59.822151945132525]
Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function.
We study the impact of the location of the collocation points on the trainability of these models.
We propose a novel adaptive collocation scheme which progressively allocates more collocation points to areas where the model is making higher errors.
arXiv Detail & Related papers (2022-07-08T18:17:06Z) - QuantNet: Learning to Quantize by Learning within Fully Differentiable
Framework [32.465949985191635]
This paper proposes a meta-based quantizer named QuantNet, which utilizes a differentiable sub-network to directly binarize the full-precision weights.
Our method not only solves the problem of gradient mismatching, but also reduces the impact of discretization errors, caused by the binarizing operation in the deployment.
arXiv Detail & Related papers (2020-09-10T01:41:05Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.