BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by
Coupling Binary Activations
- URL: http://arxiv.org/abs/2002.06517v1
- Date: Sun, 16 Feb 2020 06:18:53 GMT
- Title: BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by
Coupling Binary Activations
- Authors: Hyungjun Kim, Kyungsu Kim, Jinseok Kim, Jae-Joon Kim
- Abstract summary: We propose a new training scheme for binary activation networks called BinaryDuo in which two binary activations are coupled into a ternary activation during training.
Experimental results show that BinaryDuo outperforms state-of-the-art BNNs on various benchmarks with the same amount of parameters and computing cost.
- Score: 16.92918746295432
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Binary Neural Networks (BNNs) have been garnering interest thanks to their
compute cost reduction and memory savings. However, BNNs suffer from
performance degradation mainly due to the gradient mismatch caused by
binarizing activations. Previous works tried to address the gradient mismatch
problem by reducing the discrepancy between activation functions used at
forward pass and its differentiable approximation used at backward pass, which
is an indirect measure. In this work, we use the gradient of smoothed loss
function to better estimate the gradient mismatch in quantized neural network.
Analysis using the gradient mismatch estimator indicates that using higher
precision for activation is more effective than modifying the differentiable
approximation of activation function. Based on the observation, we propose a
new training scheme for binary activation networks called BinaryDuo in which
two binary activations are coupled into a ternary activation during training.
Experimental results show that BinaryDuo outperforms state-of-the-art BNNs on
various benchmarks with the same amount of parameters and computing cost.
Related papers
- BiPer: Binary Neural Networks using a Periodic Function [17.461853355858022]
Quantized neural networks employ reduced precision representations for both weights and activations.
Binary Neural Networks (BNNs) are the extreme quantization case, representing values with just one bit.
In contrast to current BNN approaches, we propose to employ a binary periodic (BiPer) function during binarization.
arXiv Detail & Related papers (2024-04-01T17:52:17Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors.
Our work is the first attempt to optimize BNNs from the bilinear perspective.
We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z) - Network Binarization via Contrastive Learning [16.274341164897827]
We establish a novel contrastive learning framework while training Binary Neural Networks (BNNs)
MI is introduced as the metric to measure the information shared between binary and FP activations.
Results show that our method can be implemented as a pile-up module on existing state-of-the-art binarization methods.
arXiv Detail & Related papers (2022-07-06T21:04:53Z) - ActNN: Reducing Training Memory Footprint via 2-Bit Activation
Compressed Training [68.63354877166756]
ActNN is a memory-efficient training framework that stores randomly quantized activations for back propagation.
ActNN reduces the memory footprint of the activation by 12x, and it enables training with a 6.6x to 14x larger batch size.
arXiv Detail & Related papers (2021-04-29T05:50:54Z) - Learning Frequency Domain Approximation for Binary Neural Networks [68.79904499480025]
We propose to estimate the gradient of sign function in the Fourier frequency domain using the combination of sine functions for training BNNs.
The experiments on several benchmark datasets and neural architectures illustrate that the binary network learned using our method achieves the state-of-the-art accuracy.
arXiv Detail & Related papers (2021-03-01T08:25:26Z) - BiSNN: Training Spiking Neural Networks with Binary Weights via Bayesian
Learning [37.376989855065545]
Spiking Neural Networks (SNNs) are biologically inspired, dynamic, event-driven models that enhance energy efficiency.
An SNN model is introduced that combines the benefits of temporally sparse binary activations and of binary weights.
Experiments validate the performance loss with respect to full-precision implementations.
arXiv Detail & Related papers (2020-12-15T14:06:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.