Related papers: Intelligent gradient amplification for deep neural networks

Intelligent gradient amplification for deep neural networks

URL: http://arxiv.org/abs/2305.18445v1
Date: Mon, 29 May 2023 03:38:09 GMT
Title: Intelligent gradient amplification for deep neural networks
Authors: Sunitha Basodi, Krishna Pusuluri, Xueli Xiao, Yi Pan
Abstract summary: In particular, deep learning models require larger training times as the depth of a model increases. Several solutions address these problems independently, but there have been minimal efforts to identify an integrated solution. In this work, we intelligently determine which layers of a deep learning model to apply gradient amplification to, using a formulated approach.
Score: 2.610003394404622
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Deep learning models offer superior performance compared to other machine learning techniques for a variety of tasks and domains, but pose their own challenges. In particular, deep learning models require larger training times as the depth of a model increases, and suffer from vanishing gradients. Several solutions address these problems independently, but there have been minimal efforts to identify an integrated solution that improves the performance of a model by addressing vanishing gradients, as well as accelerates the training process to achieve higher performance at larger learning rates. In this work, we intelligently determine which layers of a deep learning model to apply gradient amplification to, using a formulated approach that analyzes gradient fluctuations of layers during training. Detailed experiments are performed for simpler and deeper neural networks using two different intelligent measures and two different thresholds that determine the amplification layers, and a training strategy where gradients are amplified only during certain epochs. Results show that our amplification offers better performance compared to the original models, and achieves accuracy improvement of around 2.5% on CIFAR- 10 and around 4.5% on CIFAR-100 datasets, even when the models are trained with higher learning rates.

Related papers

Optimizing ML Training with Metagradient Descent [69.89631748402377]
We introduce an algorithm for efficiently calculating metagradients -- gradients through model training -- at scale. We then introduce a "smooth model training" framework that enables effective optimization using metagradients.
arXiv Detail & Related papers (2025-03-17T22:18:24Z)
Fast and Slow Gradient Approximation for Binary Neural Network Optimization [11.064044986709733]
hypernetwork based methods utilize neural networks to learn the gradients of non-differentiable quantization functions. We propose a Historical Gradient Storage (HGS) module, which models the historical gradient sequence to generate the first-order momentum required for optimization. We also introduce Layer Recognition Embeddings (LRE) into the hypernetwork, facilitating the generation of layer-specific fine gradients.
arXiv Detail & Related papers (2024-12-16T13:48:40Z)
Classifier-guided Gradient Modulation for Enhanced Multimodal Learning [50.7008456698935]
Gradient-Guided Modulation (CGGM) is a novel method to balance multimodal learning with gradients. We conduct extensive experiments on four multimodal datasets: UPMC-Food 101, CMU-MOSI, IEMOCAP and BraTS. CGGM outperforms all the baselines and other state-of-the-art methods consistently.
arXiv Detail & Related papers (2024-11-03T02:38:43Z)
Accelerating Deep Learning with Fixed Time Budget [2.190627491782159]
This paper proposes an effective technique for training arbitrary deep learning models within fixed time constraints. The proposed method is extensively evaluated in both classification and regression tasks in computer vision.
arXiv Detail & Related papers (2024-10-03T21:18:04Z)
ZNorm: Z-Score Gradient Normalization Accelerating Skip-Connected Network Training without Architectural Modification [0.0]
Z-Score Normalization for Gradient Descent (ZNorm) is an innovative technique that adjusts only the gradients without modifying the network architecture to accelerate training and improve model performance. ZNorm normalizes the overall gradients, providing consistent gradient scaling across layers, effectively reducing the risks of vanishing and exploding gradients and achieving superior performance. In medical imaging applications, ZNorm significantly enhances tumor prediction and segmentation accuracy, underscoring its practical utility.
arXiv Detail & Related papers (2024-08-02T12:04:19Z)
Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning. Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation. Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z)
MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning [19.57633448737394]
Gradient-based meta-learning approaches effectively address the challenge by learning how to learn novel tasks. We present a novel task-conditional diffusion-based meta-learning, called MetaDiff, that effectively models the optimization process of model weights. Experiment results show that our MetaDiff outperforms the state-of-the-art gradient-based meta-learning family in few-shot learning tasks.
arXiv Detail & Related papers (2023-07-31T06:19:48Z)
Robust Learning with Progressive Data Expansion Against Spurious Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features. Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process. We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z)
Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks. We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights. Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z)
Gradient Amplification: An efficient way to train deep neural networks [1.6542034477245091]
We propose gradient amplification approach for training deep learning models to prevent vanishing gradients. We also develop a training strategy to enable or disable gradient amplification method across several epochs with different learning rates.
arXiv Detail & Related papers (2020-06-16T20:30:55Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
Regularizing Meta-Learning via Gradient Dropout [102.29924160341572]
meta-learning models are prone to overfitting when there are no sufficient training tasks for the meta-learners to generalize. We introduce a simple yet effective method to alleviate the risk of overfitting for gradient-based meta-learning.
arXiv Detail & Related papers (2020-04-13T10:47:02Z)
Gradients as Features for Deep Representation Learning [26.996104074384263]
We address the problem of deep representation learning--the efficient adaption of a pre-trained deep network to different tasks. Our key innovation is the design of a linear model that incorporates both gradient and activation of the pre-trained network. We present an efficient algorithm for the training and inference of our model without computing the actual gradient.
arXiv Detail & Related papers (2020-04-12T02:57:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.