Learning Frequency Domain Approximation for Binary Neural Networks
- URL: http://arxiv.org/abs/2103.00841v1
- Date: Mon, 1 Mar 2021 08:25:26 GMT
- Title: Learning Frequency Domain Approximation for Binary Neural Networks
- Authors: Yixing Xu, Kai Han, Chang Xu, Yehui Tang, Chunjing Xu, Yunhe Wang
- Abstract summary: We propose to estimate the gradient of sign function in the Fourier frequency domain using the combination of sine functions for training BNNs.
The experiments on several benchmark datasets and neural architectures illustrate that the binary network learned using our method achieves the state-of-the-art accuracy.
- Score: 68.79904499480025
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Binary neural networks (BNNs) represent original full-precision weights and
activations into 1-bit with sign function. Since the gradient of the
conventional sign function is almost zero everywhere which cannot be used for
back-propagation, several attempts have been proposed to alleviate the
optimization difficulty by using approximate gradient. However, those
approximations corrupt the main direction of de facto gradient. To this end, we
propose to estimate the gradient of sign function in the Fourier frequency
domain using the combination of sine functions for training BNNs, namely
frequency domain approximation (FDA). The proposed approach does not affect the
low-frequency information of the original sign function which occupies most of
the overall energy, and high-frequency coefficients will be ignored to avoid
the huge computational overhead. In addition, we embed a noise adaptation
module into the training phase to compensate the approximation error. The
experiments on several benchmark datasets and neural architectures illustrate
that the binary network learned using our method achieves the state-of-the-art
accuracy.
Related papers
- Spectral-Refiner: Fine-Tuning of Accurate Spatiotemporal Neural Operator for Turbulent Flows [6.961408873053586]
We propose a new Stemporal Neural Operator (SFNO) that learns maps between Bochner spaces, and a new learning framework to address these issues.
This new paradigm leverages wisdom from traditional numerical PDE theory and techniques to refine the pipeline of commonly adopted end-to-end neural operator training and evaluations.
Numerical experiments on commonly used benchmarks for the 2D NSE demonstrate significant improvements in both computational efficiency and accuracy, compared to end-to-end evaluation and traditional numerical PDE solvers.
arXiv Detail & Related papers (2024-05-27T14:33:06Z) - BiPer: Binary Neural Networks using a Periodic Function [17.461853355858022]
Quantized neural networks employ reduced precision representations for both weights and activations.
Binary Neural Networks (BNNs) are the extreme quantization case, representing values with just one bit.
In contrast to current BNN approaches, we propose to employ a binary periodic (BiPer) function during binarization.
arXiv Detail & Related papers (2024-04-01T17:52:17Z) - Guaranteed Approximation Bounds for Mixed-Precision Neural Operators [83.64404557466528]
We build on intuition that neural operator learning inherently induces an approximation error.
We show that our approach reduces GPU memory usage by up to 50% and improves throughput by 58% with little or no reduction in accuracy.
arXiv Detail & Related papers (2023-07-27T17:42:06Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations.
For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two.
For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Overcoming the Spectral Bias of Neural Value Approximation [17.546011419043644]
Value approximation using deep neural networks is often the primary module that provides learning signals to the rest of the algorithm.
Recent works in neural kernel regression suggest the presence of a spectral bias, where fitting high-frequency components of the value function requires exponentially more gradient update steps than the low-frequency ones.
We re-examine off-policy reinforcement learning through the lens of kernel regression and propose to overcome such bias via a composite neural kernel.
arXiv Detail & Related papers (2022-06-09T17:59:57Z) - Convergence rates for gradient descent in the training of
overparameterized artificial neural networks with biases [3.198144010381572]
In recent years, artificial neural networks have developed into a powerful tool for dealing with a multitude of problems for which classical solution approaches.
It is still unclear why randomly gradient descent algorithms reach their limits.
arXiv Detail & Related papers (2021-02-23T18:17:47Z) - BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by
Coupling Binary Activations [16.92918746295432]
We propose a new training scheme for binary activation networks called BinaryDuo in which two binary activations are coupled into a ternary activation during training.
Experimental results show that BinaryDuo outperforms state-of-the-art BNNs on various benchmarks with the same amount of parameters and computing cost.
arXiv Detail & Related papers (2020-02-16T06:18:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.