Moment Centralization based Gradient Descent Optimizers for
Convolutional Neural Networks
- URL: http://arxiv.org/abs/2207.09066v1
- Date: Tue, 19 Jul 2022 04:38:01 GMT
- Title: Moment Centralization based Gradient Descent Optimizers for
Convolutional Neural Networks
- Authors: Sumanth Sadu, Shiv Ram Dubey, SR Sreeja
- Abstract summary: Conal neural networks (CNNs) have shown very appealing performance for many computer vision applications.
In this paper, we propose a moment centralization-based SGD datasets for CNNs.
The proposed moment centralization is generic in nature and can be integrated with any of the existing adaptive momentum-baseds.
- Score: 12.90962626557934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional neural networks (CNNs) have shown very appealing performance
for many computer vision applications. The training of CNNs is generally
performed using stochastic gradient descent (SGD) based optimization
techniques. The adaptive momentum-based SGD optimizers are the recent trends.
However, the existing optimizers are not able to maintain a zero mean in the
first-order moment and struggle with optimization. In this paper, we propose a
moment centralization-based SGD optimizer for CNNs. Specifically, we impose the
zero mean constraints on the first-order moment explicitly. The proposed moment
centralization is generic in nature and can be integrated with any of the
existing adaptive momentum-based optimizers. The proposed idea is tested with
three state-of-the-art optimization techniques, including Adam, Radam, and
Adabelief on benchmark CIFAR10, CIFAR100, and TinyImageNet datasets for image
classification. The performance of the existing optimizers is generally
improved when integrated with the proposed moment centralization. Further, The
results of the proposed moment centralization are also better than the existing
gradient centralization. The analytical analysis using the toy example shows
that the proposed method leads to a shorter and smoother optimization
trajectory. The source code is made publicly available at
\url{https://github.com/sumanthsadhu/MC-optimizer}.
Related papers
- Understanding Optimization in Deep Learning with Central Flows [53.66160508990508]
We show that an RMS's implicit behavior can be explicitly captured by a "central flow:" a differential equation.
We show that these flows can empirically predict long-term optimization trajectories of generic neural networks.
arXiv Detail & Related papers (2024-10-31T17:58:13Z) - AdaFisher: Adaptive Second Order Optimization via Fisher Information [22.851200800265914]
We present AdaFisher, an adaptive second-order that leverages a block-diagonal approximation to the Fisher information matrix for adaptive preconditioning gradient.
We demonstrate that AdaFisher outperforms the SOTAs in terms of both accuracy and convergence speed.
arXiv Detail & Related papers (2024-05-26T01:25:02Z) - Bidirectional Looking with A Novel Double Exponential Moving Average to
Adaptive and Non-adaptive Momentum Optimizers [109.52244418498974]
We propose a novel textscAdmeta (textbfADouble exponential textbfMov averagtextbfE textbfAdaptive and non-adaptive momentum) framework.
We provide two implementations, textscAdmetaR and textscAdmetaS, the former based on RAdam and the latter based on SGDM.
arXiv Detail & Related papers (2023-07-02T18:16:06Z) - Hybrid Decentralized Optimization: Leveraging Both First- and Zeroth-Order Optimizers for Faster Convergence [31.59453616577858]
We show that a distributed system can withstand noisier zeroth-order agents but can even benefit from such agents into the optimization process.
Our results hold both convex and non-zero-th order optimization objectives while they could still contribute to joint optimization tasks.
arXiv Detail & Related papers (2022-10-14T10:54:11Z) - AdaNorm: Adaptive Gradient Norm Correction based Optimizer for CNNs [23.523389372182613]
gradient descent (SGD)s are generally used to train the convolutional neural networks (CNNs)
Existing SGDs do not exploit the gradient norm of past iterations and lead to poor convergence and performance.
We propose a novel AdaNorm based SGDs by correcting the norm of gradient in each iteration based on the adaptive training history of gradient norm.
arXiv Detail & Related papers (2022-10-12T16:17:25Z) - Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network.
We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint.
Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z) - Accelerated Federated Learning with Decoupled Adaptive Optimization [53.230515878096426]
federated learning (FL) framework enables clients to collaboratively learn a shared model while keeping privacy of training data on clients.
Recently, many iterations efforts have been made to generalize centralized adaptive optimization methods, such as SGDM, Adam, AdaGrad, etc., to federated settings.
This work aims to develop novel adaptive optimization methods for FL from the perspective of dynamics of ordinary differential equations (ODEs)
arXiv Detail & Related papers (2022-07-14T22:46:43Z) - Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer.
We show that there is a natural synergy between these two settings.
We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z) - Analytical Characterization and Design Space Exploration for
Optimization of CNNs [10.15406080228806]
Loop-level optimization, including loop tiling and loop permutation, are fundamental transformations to reduce data movement.
This paper develops an analytical modeling approach for finding the best loop-level optimization configuration for CNNs on multi-core CPUs.
arXiv Detail & Related papers (2021-01-24T21:36:52Z) - Gradient Centralization: A New Optimization Technique for Deep Neural
Networks [74.935141515523]
gradient centralization (GC) operates directly on gradients by centralizing the gradient vectors to have zero mean.
GC can be viewed as a projected gradient descent method with a constrained loss function.
GC is very simple to implement and can be easily embedded into existing gradient based DNNs with only one line of code.
arXiv Detail & Related papers (2020-04-03T10:25:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.