Related papers: Backward Gradient Normalization in Deep Neural Networks

Backward Gradient Normalization in Deep Neural Networks

URL: http://arxiv.org/abs/2106.09475v1
Date: Thu, 17 Jun 2021 13:24:43 GMT
Title: Backward Gradient Normalization in Deep Neural Networks
Authors: Alejandro Cabana and Luis F. Lago-Fern\'andez
Abstract summary: We introduce a new technique for gradient normalization during neural network training. The gradients are rescaled during the backward pass using normalization layers introduced at certain points within the network architecture. Results on tests with very deep neural networks show that the new technique can do an effective control of the gradient norm.
Score: 68.8204255655161
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a new technique for gradient normalization during neural network training. The gradients are rescaled during the backward pass using normalization layers introduced at certain points within the network architecture. These normalization nodes do not affect forward activity propagation, but modify backpropagation equations to permit a well-scaled gradient flow that reaches the deepest network layers without experimenting vanishing or explosion. Results on tests with very deep neural networks show that the new technique can do an effective control of the gradient norm, allowing the update of weights in the deepest layers and improving network accuracy on several experimental conditions.

Related papers

Towards the Training of Deeper Predictive Coding Neural Networks [53.15874572081944]
Predictive coding networks trained with equilibrium propagation are neural models that perform inference through an iterative energy process.<n>Previous studies have demonstrated their effectiveness in shallow architectures, but show significant performance degradation when depth exceeds five to seven layers.<n>We show that the reason behind this degradation is due to exponentially imbalanced errors between layers during weight updates, and predictions from the previous layer not being effective in guiding updates in deeper layers.
arXiv Detail & Related papers (2025-06-30T12:44:47Z)
A Numerical Gradient Inversion Attack in Variational Quantum Neural-Networks [4.086403209504347]
The loss landscape of Variational Quantum Neural Networks (VQNNs) is characterized by local minima that grow exponentially with increasing qubits.<n>We present a numerical scheme that successfully reconstructs input training, real-world, practical data from trainable VQNNs' gradients.
arXiv Detail & Related papers (2025-04-17T10:12:38Z)
Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks [0.2302001830524133]
We instantiate a regularized form of the clipping gradient algorithm and prove that it can converge to the global minima of deep neural network loss functions. We present empirical evidence that our theoretically founded regularized gradient clipping algorithm is also competitive with the state-of-the-art deep-learnings.
arXiv Detail & Related papers (2024-04-12T17:37:42Z)
Sensitivity-Based Layer Insertion for Residual and Feedforward Neural Networks [0.3831327965422187]
Training of neural networks requires tedious and often manual tuning of the network architecture. We propose a systematic method to insert new layers during the training process, which eliminates the need to choose a fixed network size before training.
arXiv Detail & Related papers (2023-11-27T16:44:13Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations. For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two. For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z)
Gradient-trained Weights in Wide Neural Networks Align Layerwise to Error-scaled Input Correlations [11.176824373696324]
We derive the layerwise weight dynamics of infinite-width neural networks with nonlinear activations trained by gradient descent. We formulate backpropagation-free learning rules, named Align-zero and Align-ada, that theoretically achieve the same alignment as backpropagation.
arXiv Detail & Related papers (2021-06-15T21:56:38Z)
Rethinking Skip Connection with Layer Normalization in Transformers and ResNets [49.87919454950763]
Skip connection is a widely-used technique to improve the performance of deep neural networks. In this work, we investigate how the scale factors in the effectiveness of the skip connection.
arXiv Detail & Related papers (2021-05-15T11:44:49Z)
Optimization Theory for ReLU Neural Networks Trained with Normalization Layers [82.61117235807606]
The success of deep neural networks in part due to the use of normalization layers. Our analysis shows how the introduction of normalization changes the landscape and can enable faster activation.
arXiv Detail & Related papers (2020-06-11T23:55:54Z)
Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks [107.77595511218429]
In this paper, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks. We propose a feature distortion method (Disout) for addressing the aforementioned problem. The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated.
arXiv Detail & Related papers (2020-02-23T13:59:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.