Intelligent gradient amplification for deep neural networks
- URL: http://arxiv.org/abs/2305.18445v1
- Date: Mon, 29 May 2023 03:38:09 GMT
- Title: Intelligent gradient amplification for deep neural networks
- Authors: Sunitha Basodi, Krishna Pusuluri, Xueli Xiao, Yi Pan
- Abstract summary: In particular, deep learning models require larger training times as the depth of a model increases.
Several solutions address these problems independently, but there have been minimal efforts to identify an integrated solution.
In this work, we intelligently determine which layers of a deep learning model to apply gradient amplification to, using a formulated approach.
- Score: 2.610003394404622
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deep learning models offer superior performance compared to other machine
learning techniques for a variety of tasks and domains, but pose their own
challenges. In particular, deep learning models require larger training times
as the depth of a model increases, and suffer from vanishing gradients. Several
solutions address these problems independently, but there have been minimal
efforts to identify an integrated solution that improves the performance of a
model by addressing vanishing gradients, as well as accelerates the training
process to achieve higher performance at larger learning rates. In this work,
we intelligently determine which layers of a deep learning model to apply
gradient amplification to, using a formulated approach that analyzes gradient
fluctuations of layers during training. Detailed experiments are performed for
simpler and deeper neural networks using two different intelligent measures and
two different thresholds that determine the amplification layers, and a
training strategy where gradients are amplified only during certain epochs.
Results show that our amplification offers better performance compared to the
original models, and achieves accuracy improvement of around 2.5% on CIFAR- 10
and around 4.5% on CIFAR-100 datasets, even when the models are trained with
higher learning rates.
Related papers
- Classifier-guided Gradient Modulation for Enhanced Multimodal Learning [50.7008456698935]
Gradient-Guided Modulation (CGGM) is a novel method to balance multimodal learning with gradients.
We conduct extensive experiments on four multimodal datasets: UPMC-Food 101, CMU-MOSI, IEMOCAP and BraTS.
CGGM outperforms all the baselines and other state-of-the-art methods consistently.
arXiv Detail & Related papers (2024-11-03T02:38:43Z) - Accelerating Deep Learning with Fixed Time Budget [2.190627491782159]
This paper proposes an effective technique for training arbitrary deep learning models within fixed time constraints.
The proposed method is extensively evaluated in both classification and regression tasks in computer vision.
arXiv Detail & Related papers (2024-10-03T21:18:04Z) - ZNorm: Z-Score Gradient Normalization Accelerating Skip-Connected Network Training without Architectural Modification [0.0]
Z-Score Normalization for Gradient Descent (ZNorm) is an innovative technique that adjusts only the gradients without modifying the network architecture to accelerate training and improve model performance.
ZNorm normalizes the overall gradients, providing consistent gradient scaling across layers, effectively reducing the risks of vanishing and exploding gradients and achieving superior performance.
In medical imaging applications, ZNorm significantly enhances tumor prediction and segmentation accuracy, underscoring its practical utility.
arXiv Detail & Related papers (2024-08-02T12:04:19Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning [19.57633448737394]
Gradient-based meta-learning approaches effectively address the challenge by learning how to learn novel tasks.
We present a novel task-conditional diffusion-based meta-learning, called MetaDiff, that effectively models the optimization process of model weights.
Experiment results show that our MetaDiff outperforms the state-of-the-art gradient-based meta-learning family in few-shot learning tasks.
arXiv Detail & Related papers (2023-07-31T06:19:48Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Gradient Amplification: An efficient way to train deep neural networks [1.6542034477245091]
We propose gradient amplification approach for training deep learning models to prevent vanishing gradients.
We also develop a training strategy to enable or disable gradient amplification method across several epochs with different learning rates.
arXiv Detail & Related papers (2020-06-16T20:30:55Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Regularizing Meta-Learning via Gradient Dropout [102.29924160341572]
meta-learning models are prone to overfitting when there are no sufficient training tasks for the meta-learners to generalize.
We introduce a simple yet effective method to alleviate the risk of overfitting for gradient-based meta-learning.
arXiv Detail & Related papers (2020-04-13T10:47:02Z) - Gradients as Features for Deep Representation Learning [26.996104074384263]
We address the problem of deep representation learning--the efficient adaption of a pre-trained deep network to different tasks.
Our key innovation is the design of a linear model that incorporates both gradient and activation of the pre-trained network.
We present an efficient algorithm for the training and inference of our model without computing the actual gradient.
arXiv Detail & Related papers (2020-04-12T02:57:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.