Related papers: Moment Centralization based Gradient Descent Optimizers for Convolutional Neural Networks

Moment Centralization based Gradient Descent Optimizers for Convolutional Neural Networks

URL: http://arxiv.org/abs/2207.09066v1
Date: Tue, 19 Jul 2022 04:38:01 GMT
Title: Moment Centralization based Gradient Descent Optimizers for Convolutional Neural Networks
Authors: Sumanth Sadu, Shiv Ram Dubey, SR Sreeja
Abstract summary: Conal neural networks (CNNs) have shown very appealing performance for many computer vision applications. In this paper, we propose a moment centralization-based SGD datasets for CNNs. The proposed moment centralization is generic in nature and can be integrated with any of the existing adaptive momentum-baseds.
Score: 12.90962626557934
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Convolutional neural networks (CNNs) have shown very appealing performance for many computer vision applications. The training of CNNs is generally performed using stochastic gradient descent (SGD) based optimization techniques. The adaptive momentum-based SGD optimizers are the recent trends. However, the existing optimizers are not able to maintain a zero mean in the first-order moment and struggle with optimization. In this paper, we propose a moment centralization-based SGD optimizer for CNNs. Specifically, we impose the zero mean constraints on the first-order moment explicitly. The proposed moment centralization is generic in nature and can be integrated with any of the existing adaptive momentum-based optimizers. The proposed idea is tested with three state-of-the-art optimization techniques, including Adam, Radam, and Adabelief on benchmark CIFAR10, CIFAR100, and TinyImageNet datasets for image classification. The performance of the existing optimizers is generally improved when integrated with the proposed moment centralization. Further, The results of the proposed moment centralization are also better than the existing gradient centralization. The analytical analysis using the toy example shows that the proposed method leads to a shorter and smoother optimization trajectory. The source code is made publicly available at \url{https://github.com/sumanthsadhu/MC-optimizer}.

Related papers

Optimizers Qualitatively Alter Solutions And We Should Leverage This [62.662640460717476]
Deep Neural Networks (DNNs) can not guarantee convergence to a unique global minimum of the loss when using only local information, such as SGD.<n>We argue that the community should aim at understanding the biases of already existing methods, as well as aim to build new DNNs with the explicit intent of inducing certain properties of the solution.
arXiv Detail & Related papers (2025-07-16T13:33:31Z)
Revisiting the Initial Steps in Adaptive Gradient Descent Optimization [6.468625143772815]
Adaptive gradient optimization methods, such as Adam, are prevalent in training deep neural networks across diverse machine learning tasks. These methods often suffer from suboptimal generalization compared to descent gradient (SGD) and exhibit instability. We introduce simple yet effective solutions: initializing the second-order moment estimation with non-zero values.
arXiv Detail & Related papers (2024-12-03T04:28:14Z)
Understanding Optimization in Deep Learning with Central Flows [53.66160508990508]
We show that an RMS's implicit behavior can be explicitly captured by a "central flow:" a differential equation. We show that these flows can empirically predict long-term optimization trajectories of generic neural networks.
arXiv Detail & Related papers (2024-10-31T17:58:13Z)
AdaFisher: Adaptive Second Order Optimization via Fisher Information [22.851200800265914]
We present AdaFisher, an adaptive second-order that leverages a block-diagonal approximation to the Fisher information matrix for adaptive preconditioning gradient. We demonstrate that AdaFisher outperforms the SOTAs in terms of both accuracy and convergence speed.
arXiv Detail & Related papers (2024-05-26T01:25:02Z)
Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers [109.52244418498974]
We propose a novel textscAdmeta (textbfADouble exponential textbfMov averagtextbfE textbfAdaptive and non-adaptive momentum) framework. We provide two implementations, textscAdmetaR and textscAdmetaS, the former based on RAdam and the latter based on SGDM.
arXiv Detail & Related papers (2023-07-02T18:16:06Z)
Hybrid Decentralized Optimization: Leveraging Both First- and Zeroth-Order Optimizers for Faster Convergence [31.59453616577858]
We show that a distributed system can withstand noisier zeroth-order agents but can even benefit from such agents into the optimization process. Our results hold both convex and non-zero-th order optimization objectives while they could still contribute to joint optimization tasks.
arXiv Detail & Related papers (2022-10-14T10:54:11Z)
AdaNorm: Adaptive Gradient Norm Correction based Optimizer for CNNs [23.523389372182613]
gradient descent (SGD)s are generally used to train the convolutional neural networks (CNNs) Existing SGDs do not exploit the gradient norm of past iterations and lead to poor convergence and performance. We propose a novel AdaNorm based SGDs by correcting the norm of gradient in each iteration based on the adaptive training history of gradient norm.
arXiv Detail & Related papers (2022-10-12T16:17:25Z)
Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network. We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint. Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z)
Accelerated Federated Learning with Decoupled Adaptive Optimization [53.230515878096426]
federated learning (FL) framework enables clients to collaboratively learn a shared model while keeping privacy of training data on clients. Recently, many iterations efforts have been made to generalize centralized adaptive optimization methods, such as SGDM, Adam, AdaGrad, etc., to federated settings. This work aims to develop novel adaptive optimization methods for FL from the perspective of dynamics of ordinary differential equations (ODEs)
arXiv Detail & Related papers (2022-07-14T22:46:43Z)
Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer. We show that there is a natural synergy between these two settings. We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z)
Analytical Characterization and Design Space Exploration for Optimization of CNNs [10.15406080228806]
Loop-level optimization, including loop tiling and loop permutation, are fundamental transformations to reduce data movement. This paper develops an analytical modeling approach for finding the best loop-level optimization configuration for CNNs on multi-core CPUs.
arXiv Detail & Related papers (2021-01-24T21:36:52Z)
Gradient Centralization: A New Optimization Technique for Deep Neural Networks [74.935141515523]
gradient centralization (GC) operates directly on gradients by centralizing the gradient vectors to have zero mean. GC can be viewed as a projected gradient descent method with a constrained loss function. GC is very simple to implement and can be easily embedded into existing gradient based DNNs with only one line of code.
arXiv Detail & Related papers (2020-04-03T10:25:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.