Gradient Amplification: An efficient way to train deep neural networks
- URL: http://arxiv.org/abs/2006.10560v1
- Date: Tue, 16 Jun 2020 20:30:55 GMT
- Title: Gradient Amplification: An efficient way to train deep neural networks
- Authors: Sunitha Basodi, Chunyan Ji, Haiping Zhang, and Yi Pan
- Abstract summary: We propose gradient amplification approach for training deep learning models to prevent vanishing gradients.
We also develop a training strategy to enable or disable gradient amplification method across several epochs with different learning rates.
- Score: 1.6542034477245091
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Improving performance of deep learning models and reducing their training
times are ongoing challenges in deep neural networks. There are several
approaches proposed to address these challenges one of which is to increase the
depth of the neural networks. Such deeper networks not only increase training
times, but also suffer from vanishing gradients problem while training. In this
work, we propose gradient amplification approach for training deep learning
models to prevent vanishing gradients and also develop a training strategy to
enable or disable gradient amplification method across several epochs with
different learning rates. We perform experiments on VGG-19 and resnet
(Resnet-18 and Resnet-34) models, and study the impact of amplification
parameters on these models in detail. Our proposed approach improves
performance of these deep learning models even at higher learning rates,
thereby allowing these models to achieve higher performance with reduced
training time.
Related papers
- Take A Shortcut Back: Mitigating the Gradient Vanishing for Training Spiking Neural Networks [15.691263438655842]
Spiking Neural Network (SNN) is a biologically inspired neural network infrastructure that has recently garnered significant attention.
Training an SNN directly poses a challenge due to the undefined gradient of the firing spike process.
We propose a shortcut back-propagation method in our paper, which advocates for transmitting the gradient directly from the loss to the shallow layers.
arXiv Detail & Related papers (2024-01-09T10:54:41Z) - BLoad: Enhancing Neural Network Training with Efficient Sequential Data Handling [8.859850475075238]
We propose a novel training scheme that enables efficient distributed data-parallel training on sequences of different sizes with minimal overhead.
By using this scheme we were able to reduce the padding amount by more than 100$x$ while not deleting a single frame, resulting in an overall increased performance on both training time and Recall.
arXiv Detail & Related papers (2023-10-16T23:14:56Z) - A Novel Method for improving accuracy in neural network by reinstating
traditional back propagation technique [0.0]
We propose a novel instant parameter update methodology that eliminates the need for computing gradients at each layer.
Our approach accelerates learning, avoids the vanishing gradient problem, and outperforms state-of-the-art methods on benchmark data sets.
arXiv Detail & Related papers (2023-08-09T16:41:00Z) - Intelligent gradient amplification for deep neural networks [2.610003394404622]
In particular, deep learning models require larger training times as the depth of a model increases.
Several solutions address these problems independently, but there have been minimal efforts to identify an integrated solution.
In this work, we intelligently determine which layers of a deep learning model to apply gradient amplification to, using a formulated approach.
arXiv Detail & Related papers (2023-05-29T03:38:09Z) - Dynamics-aware Adversarial Attack of Adaptive Neural Networks [75.50214601278455]
We investigate the dynamics-aware adversarial attack problem of adaptive neural networks.
We propose a Leaded Gradient Method (LGM) and show the significant effects of the lagged gradient.
Our LGM achieves impressive adversarial attack performance compared with the dynamic-unaware attack methods.
arXiv Detail & Related papers (2022-10-15T01:32:08Z) - Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep
Neural Network, a Survey [69.3939291118954]
State-of-the-art deep learning models have a parameter count that reaches into the billions. Training, storing and transferring such models is energy and time consuming, thus costly.
Model compression lowers storage and transfer costs, and can further make training more efficient by decreasing the number of computations in the forward and/or backward pass.
This work is a survey on methods which reduce the number of trained weights in deep learning models throughout the training.
arXiv Detail & Related papers (2022-05-17T05:37:08Z) - Online Convolutional Re-parameterization [51.97831675242173]
We present online convolutional re- parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution.
Compared with the state-of-the-art re-param models, OREPA is able to save the training-time memory cost by about 70% and accelerate the training speed by around 2x.
We also conduct experiments on object detection and semantic segmentation and show consistent improvements on the downstream tasks.
arXiv Detail & Related papers (2022-04-02T09:50:19Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Subset Sampling For Progressive Neural Network Learning [106.12874293597754]
Progressive Neural Network Learning is a class of algorithms that incrementally construct the network's topology and optimize its parameters based on the training data.
We propose to speed up this process by exploiting subsets of training data at each incremental training step.
Experimental results in object, scene and face recognition problems demonstrate that the proposed approach speeds up the optimization procedure considerably.
arXiv Detail & Related papers (2020-02-17T18:57:33Z) - Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications.
In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training.
Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z) - Frosting Weights for Better Continual Training [22.554993259239307]
Training a neural network model can be a lifelong learning process and is a computationally intensive one.
Deep neural network models can suffer from catastrophic forgetting during retraining on new data.
We propose two generic ensemble approaches, gradient boosting and meta-learning, to solve the problem.
arXiv Detail & Related papers (2020-01-07T00:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.