Gradient Mask: Lateral Inhibition Mechanism Improves Performance in
Artificial Neural Networks
- URL: http://arxiv.org/abs/2208.06918v1
- Date: Sun, 14 Aug 2022 20:55:50 GMT
- Title: Gradient Mask: Lateral Inhibition Mechanism Improves Performance in
Artificial Neural Networks
- Authors: Lei Jiang and Yongqing Liu and Shihai Xiao and Yansong Chua
- Abstract summary: We propose Gradient Mask, which effectively filters out noise gradients in the process of backpropagation.
This allows the learned feature information to be more intensively stored in the network.
We show analytically how lateral inhibition in artificial neural networks improves the quality of propagated gradients.
- Score: 5.591477512580285
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Lateral inhibitory connections have been observed in the cortex of the
biological brain, and has been extensively studied in terms of its role in
cognitive functions. However, in the vanilla version of backpropagation in deep
learning, all gradients (which can be understood to comprise of both signal and
noise gradients) flow through the network during weight updates. This may lead
to overfitting. In this work, inspired by biological lateral inhibition, we
propose Gradient Mask, which effectively filters out noise gradients in the
process of backpropagation. This allows the learned feature information to be
more intensively stored in the network while filtering out noisy or unimportant
features. Furthermore, we demonstrate analytically how lateral inhibition in
artificial neural networks improves the quality of propagated gradients. A new
criterion for gradient quality is proposed which can be used as a measure
during training of various convolutional neural networks (CNNs). Finally, we
conduct several different experiments to study how Gradient Mask improves the
performance of the network both quantitatively and qualitatively.
Quantitatively, accuracy in the original CNN architecture, accuracy after
pruning, and accuracy after adversarial attacks have shown improvements.
Qualitatively, the CNN trained using Gradient Mask has developed saliency maps
that focus primarily on the object of interest, which is useful for data
augmentation and network interpretability.
Related papers
- Fractional-order spike-timing-dependent gradient descent for multi-layer spiking neural networks [18.142378139047977]
This paper proposes a fractional-order spike-timing-dependent gradient descent (FOSTDGD) learning model.
It is tested on theNIST and DVS128 Gesture datasets and its accuracy under different network structure and fractional orders is analyzed.
arXiv Detail & Related papers (2024-10-20T05:31:34Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Impact of spiking neurons leakages and network recurrences on
event-based spatio-temporal pattern recognition [0.0]
Spiking neural networks coupled with neuromorphic hardware and event-based sensors are getting increased interest for low-latency and low-power inference at the edge.
We explore the impact of synaptic and membrane leakages in spiking neurons.
arXiv Detail & Related papers (2022-11-14T21:34:02Z) - Improving the Trainability of Deep Neural Networks through Layerwise
Batch-Entropy Regularization [1.3999481573773072]
We introduce and evaluate the batch-entropy which quantifies the flow of information through each layer of a neural network.
We show that we can train a "vanilla" fully connected network and convolutional neural network with 500 layers by simply adding the batch-entropy regularization term to the loss function.
arXiv Detail & Related papers (2022-08-01T20:31:58Z) - Benign Overfitting in Two-layer Convolutional Neural Networks [90.75603889605043]
We study the benign overfitting phenomenon in training a two-layer convolutional neural network (CNN)
We show that when the signal-to-noise ratio satisfies a certain condition, a two-layer CNN trained by gradient descent can achieve arbitrarily small training and test loss.
On the other hand, when this condition does not hold, overfitting becomes harmful and the obtained CNN can only achieve constant level test loss.
arXiv Detail & Related papers (2022-02-14T07:45:51Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Improving Deep Learning Interpretability by Saliency Guided Training [36.782919916001624]
Saliency methods have been widely used to highlight important input features in model predictions.
Most existing methods use backpropagation on a modified gradient function to generate saliency maps.
We introduce a saliency guided training procedure for neural networks to reduce noisy gradients used in predictions.
arXiv Detail & Related papers (2021-11-29T06:05:23Z) - The FaceChannel: A Fast & Furious Deep Neural Network for Facial
Expression Recognition [71.24825724518847]
Current state-of-the-art models for automatic Facial Expression Recognition (FER) are based on very deep neural networks that are effective but rather expensive to train.
We formalize the FaceChannel, a light-weight neural network that has much fewer parameters than common deep neural networks.
We demonstrate how our model achieves a comparable, if not better, performance to the current state-of-the-art in FER.
arXiv Detail & Related papers (2020-09-15T09:25:37Z) - How Do Neural Networks Estimate Optical Flow? A Neuropsychology-Inspired
Study [0.0]
In this article, we investigate how deep neural networks estimate optical flow.
For our investigation, we focus on FlowNetS, as it is the prototype of an encoder-decoder neural network for optical flow estimation.
We use a filter identification method that has played a major role in uncovering the motion filters present in animal brains in neuropsychological research.
arXiv Detail & Related papers (2020-04-20T14:08:28Z) - Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation.
We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters.
As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.