Exploiting Adam-like Optimization Algorithms to Improve the Performance
of Convolutional Neural Networks
- URL: http://arxiv.org/abs/2103.14689v1
- Date: Fri, 26 Mar 2021 18:55:08 GMT
- Title: Exploiting Adam-like Optimization Algorithms to Improve the Performance
of Convolutional Neural Networks
- Authors: Loris Nanni, Gianluca Maguolo, Alessandra Lumini
- Abstract summary: gradient descent (SGD) is the main approach for training deep networks.
In this work, we compare Adam based variants based on the difference between the present and the past gradients.
We have tested ensemble of networks and the fusion with ResNet50 trained with gradient descent.
- Score: 82.61182037130405
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stochastic gradient descent (SGD) is the main approach for training deep
networks: it moves towards the optimum of the cost function by iteratively
updating the parameters of a model in the direction of the gradient of the loss
evaluated on a minibatch. Several variants of SGD have been proposed to make
adaptive step sizes for each parameter (adaptive gradient) and take into
account the previous updates (momentum). Among several alternative of SGD the
most popular are AdaGrad, AdaDelta, RMSProp and Adam which scale coordinates of
the gradient by square roots of some form of averaging of the squared
coordinates in the past gradients and automatically adjust the learning rate on
a parameter basis. In this work, we compare Adam based variants based on the
difference between the present and the past gradients, the step size is
adjusted for each parameter. We run several tests benchmarking proposed methods
using medical image data. The experiments are performed using ResNet50
architecture neural network. Moreover, we have tested ensemble of networks and
the fusion with ResNet50 trained with stochastic gradient descent. To combine
the set of ResNet50 the simple sum rule has been applied. Proposed ensemble
obtains very high performance, it obtains accuracy comparable or better than
actual state of the art. To improve reproducibility and research efficiency the
MATLAB source code used for this research is available at GitHub:
https://github.com/LorisNanni.
Related papers
- Variational Stochastic Gradient Descent for Deep Neural Networks [16.96187187108041]
Current state-of-the-arts are adaptive gradient-based optimization methods such as Adam.
Here, we propose to combine both approaches, resulting in the Variational Gradient Descent (VSGD)
We show how our VSGD method relates to other adaptive gradient-baseds like Adam.
arXiv Detail & Related papers (2024-04-09T18:02:01Z) - Neural Gradient Learning and Optimization for Oriented Point Normal
Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation.
We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors.
Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - A Control Theoretic Framework for Adaptive Gradient Optimizers in
Machine Learning [0.6526824510982802]
Adaptive gradient methods have become popular in optimizing deep neural networks.
Recent examples include AdaGrad and Adam.
We develop a generic framework for adaptive gradient methods.
arXiv Detail & Related papers (2022-06-04T17:55:33Z) - Softmax Gradient Tampering: Decoupling the Backward Pass for Improved
Fitting [8.072117741487046]
We introduce Softmax Gradient Tampering, a technique for modifying the gradients in the backward pass of neural networks.
We demonstrate that modifying the softmax gradients in ConvNets may result in increased training accuracy.
arXiv Detail & Related papers (2021-11-24T13:47:36Z) - Tom: Leveraging trend of the observed gradients for faster convergence [0.0]
Tom is a novel variant of Adam that takes into account the trend observed for the gradients in the landscape in the loss traversed by the neural network.
Tom outperforms Adagrad, Adadelta, RMSProp and Adam in terms of both accuracy and has a faster convergence.
arXiv Detail & Related papers (2021-09-07T20:19:40Z) - Why Approximate Matrix Square Root Outperforms Accurate SVD in Global
Covariance Pooling? [59.820507600960745]
We propose a new GCP meta-layer that uses SVD in the forward pass, and Pad'e Approximants in the backward propagation to compute the gradients.
The proposed meta-layer has been integrated into different CNN models and achieves state-of-the-art performances on both large-scale and fine-grained datasets.
arXiv Detail & Related papers (2021-05-06T08:03:45Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z) - Human Body Model Fitting by Learned Gradient Descent [48.79414884222403]
We propose a novel algorithm for the fitting of 3D human shape to images.
We show that this algorithm is fast (avg. 120ms convergence), robust to dataset, and achieves state-of-the-art results on public evaluation datasets.
arXiv Detail & Related papers (2020-08-19T14:26:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.