Leveraging Continuously Differentiable Activation Functions for Learning
in Quantized Noisy Environments
- URL: http://arxiv.org/abs/2402.02593v1
- Date: Sun, 4 Feb 2024 20:01:22 GMT
- Title: Leveraging Continuously Differentiable Activation Functions for Learning
in Quantized Noisy Environments
- Authors: Vivswan Shah and Nathan Youngblood
- Abstract summary: Real-world analog systems intrinsically suffer from noise that can impede model convergence and accuracy on a variety of deep learning models.
We demonstrate that differentiable activations like GELU and SiLU enable robust propagation of gradients which help to mitigate analog quantization error.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Real-world analog systems intrinsically suffer from noise that can impede
model convergence and accuracy on a variety of deep learning models. We
demonstrate that differentiable activations like GELU and SiLU enable robust
propagation of gradients which help to mitigate analog quantization error that
is ubiquitous to all analog systems. We perform analysis and training of
convolutional, linear, and transformer networks in the presence of quantized
noise. Here, we are able to demonstrate that continuously differentiable
activation functions are significantly more noise resilient over conventional
rectified activations. As in the case of ReLU, the error in gradients are 100x
higher than those in GELU near zero. Our findings provide guidance for
selecting appropriate activations to realize performant and reliable hardware
implementations across several machine learning domains such as computer
vision, signal processing, and beyond.
Related papers
- Hierarchical Neural Operator Transformer with Learnable Frequency-aware Loss Prior for Arbitrary-scale Super-resolution [13.298472586395276]
We present an arbitrary-scale super-resolution (SR) method to enhance the resolution of scientific data.
We conduct extensive experiments on diverse datasets from different domains.
arXiv Detail & Related papers (2024-05-20T17:39:29Z) - Learning noise-induced transitions by multi-scaling reservoir computing [2.9170682727903863]
We develop a machine learning model, reservoir computing as a type of recurrent neural network, to learn noise-induced transitions.
The trained model generates accurate statistics of transition time and the number of transitions.
It is also aware of the asymmetry of the double-well potential, the rotational dynamics caused by non-detailed balance, and transitions in multi-stable systems.
arXiv Detail & Related papers (2023-09-11T12:26:36Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - Degradation-Noise-Aware Deep Unfolding Transformer for Hyperspectral
Image Denoising [9.119226249676501]
Hyperspectral images (HSIs) are often quite noisy because of narrow band spectral filtering.
To reduce the noise in HSI data cubes, both model-driven and learning-based denoising algorithms have been proposed.
This paper proposes a Degradation-Noise-Aware Unfolding Network (DNA-Net) that addresses these issues.
arXiv Detail & Related papers (2023-05-06T13:28:20Z) - Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework.
We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels.
Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z) - Improve Noise Tolerance of Robust Loss via Noise-Awareness [60.34670515595074]
We propose a meta-learning method which is capable of adaptively learning a hyper parameter prediction function, called Noise-Aware-Robust-Loss-Adjuster (NARL-Adjuster for brevity)
Four SOTA robust loss functions are attempted to be integrated with our algorithm, and comprehensive experiments substantiate the general availability and effectiveness of the proposed method in both its noise tolerance and performance.
arXiv Detail & Related papers (2023-01-18T04:54:58Z) - Decision Forest Based EMG Signal Classification with Low Volume Dataset
Augmented with Random Variance Gaussian Noise [51.76329821186873]
We produce a model that can classify six different hand gestures with a limited number of samples that generalizes well to a wider audience.
We appeal to a set of more elementary methods such as the use of random bounds on a signal, but desire to show the power these methods can carry in an online setting.
arXiv Detail & Related papers (2022-06-29T23:22:18Z) - Learning Frequency Domain Approximation for Binary Neural Networks [68.79904499480025]
We propose to estimate the gradient of sign function in the Fourier frequency domain using the combination of sine functions for training BNNs.
The experiments on several benchmark datasets and neural architectures illustrate that the binary network learned using our method achieves the state-of-the-art accuracy.
arXiv Detail & Related papers (2021-03-01T08:25:26Z) - Behavioral Model Inference of Black-box Software using Deep Neural
Networks [1.6593369275241105]
Many software engineering tasks, such as testing, and anomaly detection can benefit from the ability to infer a behavioral model of the software.
Most existing inference approaches assume access to code to collect execution sequences.
We show how this approach can be used to accurately detect state changes, and how the inferred models can be successfully applied to transfer-learning scenarios.
arXiv Detail & Related papers (2021-01-13T09:23:37Z) - Towards Efficient Processing and Learning with Spikes: New Approaches
for Multi-Spike Learning [59.249322621035056]
We propose two new multi-spike learning rules which demonstrate better performance over other baselines on various tasks.
In the feature detection task, we re-examine the ability of unsupervised STDP with its limitations being presented.
Our proposed learning rules can reliably solve the task over a wide range of conditions without specific constraints being applied.
arXiv Detail & Related papers (2020-05-02T06:41:20Z) - DDSP: Differentiable Digital Signal Processing [13.448630251745163]
We introduce the Differentiable Digital Signal Processing (DDSP) library, which enables direct integration of classic signal processing elements with deep learning methods.
We achieve high-fidelity generation without the need for large autoregressive models or adversarial losses.
P enables an interpretable and modular approach to generative modeling, without sacrificing the benefits of deep learning.
arXiv Detail & Related papers (2020-01-14T06:49:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.