Binarizing by Classification: Is soft function really necessary?
- URL: http://arxiv.org/abs/2205.07433v3
- Date: Sun, 16 Jul 2023 07:22:19 GMT
- Title: Binarizing by Classification: Is soft function really necessary?
- Authors: Yefei He, Luoming Zhang, Weijia Wu, Hong Zhou
- Abstract summary: We propose to tackle network binarization as a binary classification problem.
We also take binarization as a lightweighting approach for pose estimation models.
The proposed method enables binary networks to achieve a mAP of up to $60.6$ for the first time.
- Score: 4.329951775163721
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Binary neural networks leverage $\mathrm{Sign}$ function to binarize weights
and activations, which require gradient estimators to overcome its
non-differentiability and will inevitably bring gradient errors during
backpropagation. Although many hand-designed soft functions have been proposed
as gradient estimators to better approximate gradients, their mechanism is not
clear and there are still huge performance gaps between binary models and their
full-precision counterparts. To address these issues and reduce gradient error,
we propose to tackle network binarization as a binary classification problem
and use a multi-layer perceptron (MLP) as the classifier in the forward pass
and gradient estimator in the backward pass. Benefiting from the MLP's
theoretical capability to fit any continuous function, it can be adaptively
learned to binarize networks and backpropagate gradients without any prior
knowledge of soft functions. From this perspective, we further empirically
justify that even a simple linear function can outperform previous complex soft
functions. Extensive experiments demonstrate that the proposed method yields
surprising performance both in image classification and human pose estimation
tasks. Specifically, we achieve $65.7\%$ top-1 accuracy of ResNet-34 on
ImageNet dataset, with an absolute improvement of $2.6\%$. Moreover, we take
binarization as a lightweighting approach for pose estimation models and
propose well-designed binary pose estimation networks SBPN and BHRNet. When
evaluating on the challenging Microsoft COCO keypoint dataset, the proposed
method enables binary networks to achieve a mAP of up to $60.6$ for the first
time. Experiments conducted on real platforms demonstrate that BNN achieves a
better balance between performance and computational complexity, especially
when computational resources are extremely low.
Related papers
- Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Provably Efficient Neural Offline Reinforcement Learning via Perturbed
Rewards [33.88533898709351]
VIPeR amalgamates the randomized value function idea with the pessimism principle.
It implicitly obtains pessimism by simply perturbing the offline data multiple times.
It is both provably and computationally efficient in general Markov decision processes (MDPs) with neural network function approximation.
arXiv Detail & Related papers (2023-02-24T17:52:12Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - AdaBin: Improving Binary Neural Networks with Adaptive Binary Sets [27.022212653067367]
This paper studies the Binary Neural Networks (BNNs) in which weights and activations are both binarized into 1-bit values.
We present a simple yet effective approach called AdaBin to adaptively obtain the optimal binary sets.
Experimental results on benchmark models and datasets demonstrate that the proposed AdaBin is able to achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-08-17T05:43:33Z) - Adaptive Self-supervision Algorithms for Physics-informed Neural
Networks [59.822151945132525]
Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function.
We study the impact of the location of the collocation points on the trainability of these models.
We propose a novel adaptive collocation scheme which progressively allocates more collocation points to areas where the model is making higher errors.
arXiv Detail & Related papers (2022-07-08T18:17:06Z) - Network Binarization via Contrastive Learning [16.274341164897827]
We establish a novel contrastive learning framework while training Binary Neural Networks (BNNs)
MI is introduced as the metric to measure the information shared between binary and FP activations.
Results show that our method can be implemented as a pile-up module on existing state-of-the-art binarization methods.
arXiv Detail & Related papers (2022-07-06T21:04:53Z) - QuantNet: Learning to Quantize by Learning within Fully Differentiable
Framework [32.465949985191635]
This paper proposes a meta-based quantizer named QuantNet, which utilizes a differentiable sub-network to directly binarize the full-precision weights.
Our method not only solves the problem of gradient mismatching, but also reduces the impact of discretization errors, caused by the binarizing operation in the deployment.
arXiv Detail & Related papers (2020-09-10T01:41:05Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.