Curvature Injected Adaptive Momentum Optimizer for Convolutional Neural
Networks
- URL: http://arxiv.org/abs/2109.12504v1
- Date: Sun, 26 Sep 2021 06:24:14 GMT
- Title: Curvature Injected Adaptive Momentum Optimizer for Convolutional Neural
Networks
- Authors: Shiv Ram Dubey, S.H. Shabbeer Basha, Satish Kumar Singh, Bidyut Baran
Chaudhuri
- Abstract summary: We propose a new approach, hereafter referred as AdaInject, for the descent gradients.
The curvature information is used as a weight to inject the second order moment in the update rule.
The AdaInject approach boosts the parameter update by exploiting the curvature information.
- Score: 21.205976369691765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a new approach, hereafter referred as AdaInject,
for the gradient descent optimizers by injecting the curvature information with
adaptive momentum. Specifically, the curvature information is used as a weight
to inject the second order moment in the update rule. The curvature information
is captured through the short-term parameter history. The AdaInject approach
boosts the parameter update by exploiting the curvature information. The
proposed approach is generic in nature and can be integrated with any existing
adaptive momentum stochastic gradient descent optimizers. The effectiveness of
the AdaInject optimizer is tested using a theoretical analysis as well as
through toy examples. We also show the convergence property of the proposed
injection based optimizer. Further, we depict the efficacy of the AdaInject
approach through extensive experiments in conjunction with the state-of-the-art
optimizers, i.e., AdamInject, diffGradInject, RadamInject, and AdaBeliefInject
on four benchmark datasets. Different CNN models are used in the experiments. A
highest improvement in the top-1 classification error rate of $16.54\%$ is
observed using diffGradInject optimizer with ResNeXt29 model over the CIFAR10
dataset. Overall, we observe very promising performance improvement of existing
optimizers with the proposed AdaInject approach.
Related papers
- Gradient Guidance for Diffusion Models: An Optimization Perspective [45.6080199096424]
This paper studies a form of gradient guidance for adapting a pre-trained diffusion model towards optimizing user-specified objectives.
We establish a mathematical framework for guided diffusion to systematically study its optimization theory and algorithmic design.
arXiv Detail & Related papers (2024-04-23T04:51:02Z) - Efficient adjustment for complex covariates: Gaining efficiency with
DOPE [56.537164957672715]
We propose a framework that accommodates adjustment for any subset of information expressed by the covariates.
Based on our theoretical results, we propose the Debiased Outcome-adapted Propensity Estorimator (DOPE) for efficient estimation of the average treatment effect (ATE)
Our results show that the DOPE provides an efficient and robust methodology for ATE estimation in various observational settings.
arXiv Detail & Related papers (2024-02-20T13:02:51Z) - ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - Predictive Modeling through Hyper-Bayesian Optimization [60.586813904500595]
We propose a novel way of integrating model selection and BO for the single goal of reaching the function optima faster.
The algorithm moves back and forth between BO in the model space and BO in the function space, where the goodness of the recommended model is captured.
In addition to improved sample efficiency, the framework outputs information about the black-box function.
arXiv Detail & Related papers (2023-08-01T04:46:58Z) - Deep neural operators can serve as accurate surrogates for shape
optimization: A case study for airfoils [3.2996060586026354]
We investigate the use of DeepONets to infer flow fields around unseen airfoils with the aim of shape optimization.
We present results which display little to no degradation in prediction accuracy, while reducing the online optimization cost by orders of magnitude.
arXiv Detail & Related papers (2023-02-02T00:19:09Z) - EXACT: How to Train Your Accuracy [6.144680854063938]
We propose a new optimization framework by introducing ascentity to a model's output and optimizing expected accuracy.
Experiments on linear models and deep image classification show that the proposed optimization method is a powerful alternative to widely used classification losses.
arXiv Detail & Related papers (2022-05-19T15:13:00Z) - Gravity Optimizer: a Kinematic Approach on Optimization in Deep Learning [0.0]
We introduce Gravity, another algorithm for gradient-based optimization.
In this paper, we explain how our novel idea change parameters to reduce the deep learning model's loss.
Also, we propose an alternative to moving average.
arXiv Detail & Related papers (2021-01-22T16:27:34Z) - Self-Tuning Stochastic Optimization with Curvature-Aware Gradient
Filtering [53.523517926927894]
We explore the use of exact per-sample Hessian-vector products and gradients to construct self-tuning quadratics.
We prove that our model-based procedure converges in noisy gradient setting.
This is an interesting step for constructing self-tuning quadratics.
arXiv Detail & Related papers (2020-11-09T22:07:30Z) - Advanced Dropout: A Model-free Methodology for Bayesian Dropout
Optimization [62.8384110757689]
Overfitting ubiquitously exists in real-world applications of deep neural networks (DNNs)
The advanced dropout technique applies a model-free and easily implemented distribution with parametric prior, and adaptively adjusts dropout rate.
We evaluate the effectiveness of the advanced dropout against nine dropout techniques on seven computer vision datasets.
arXiv Detail & Related papers (2020-10-11T13:19:58Z) - Interpreting Robust Optimization via Adversarial Influence Functions [24.937845875059928]
We introduce the Adversarial Influence Function (AIF) as a tool to investigate the solution produced by robust optimization.
To illustrate the usage of AIF, we apply it to study model sensitivity -- a quantity defined to capture the change of prediction losses on the natural data.
We use AIF to analyze how model complexity and randomized smoothing affect the model sensitivity with respect to specific models.
arXiv Detail & Related papers (2020-10-03T01:19:10Z) - Learnable Bernoulli Dropout for Bayesian Deep Learning [53.79615543862426]
Learnable Bernoulli dropout (LBD) is a new model-agnostic dropout scheme that considers the dropout rates as parameters jointly optimized with other model parameters.
LBD leads to improved accuracy and uncertainty estimates in image classification and semantic segmentation.
arXiv Detail & Related papers (2020-02-12T18:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.