Effectiveness of Optimization Algorithms in Deep Image Classification
- URL: http://arxiv.org/abs/2110.01598v1
- Date: Mon, 4 Oct 2021 17:50:51 GMT
- Title: Effectiveness of Optimization Algorithms in Deep Image Classification
- Authors: Zhaoyang Zhu, Haozhe Sun, Chi Zhang
- Abstract summary: Two new adams, AdaBelief and Padam are introduced among community.
We analyze these two adams and compare them with other conventionals (Adam, SGD + Momentum) in the scenario of image classification.
We evaluate the performance of these optimization algorithms on AlexNet and simplified versions of VGGNet, ResNet using the EMNIST dataset.
- Score: 6.368679897630892
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adam is applied widely to train neural networks. Different kinds of Adam
methods with different features pop out. Recently two new adam optimizers,
AdaBelief and Padam are introduced among the community. We analyze these two
adam optimizers and compare them with other conventional optimizers (Adam, SGD
+ Momentum) in the scenario of image classification. We evaluate the
performance of these optimization algorithms on AlexNet and simplified versions
of VGGNet, ResNet using the EMNIST dataset. (Benchmark algorithm is available
at
\hyperref[https://github.com/chuiyunjun/projectCSC413]{https://github.com/chuiyunjun/projectCSC413}).
Related papers
- Deconstructing What Makes a Good Optimizer for Language Models [7.9224468703944115]
We compare several optimization algorithms, including SGD, Adafactor, Adam, and Lion, in the context of autoregressive language modeling.
Our findings indicate that, except for SGD, these algorithms all perform comparably both in their optimal performance.
arXiv Detail & Related papers (2024-07-10T18:11:40Z) - Neural Optimizer Equation, Decay Function, and Learning Rate Schedule Joint Evolution [0.0]
A major contributor to the quality of a deep learning model is the selection of the Conv.
We propose a new dual-joint search space in realm neural search (NOS), along with an integrity check, to automate the process of finding deep learnings.
We find multiples, learning rate schedules, and Adam variants that outperformed Adam, as well as other standard deep learnings, across the image classification tasks.
arXiv Detail & Related papers (2024-04-10T02:00:24Z) - Improving the Adaptive Moment Estimation (ADAM) stochastic optimizer through an Implicit-Explicit (IMEX) time-stepping approach [1.2233362977312945]
The classical Adam algorithm is a first-order implicit-explicit (IMEX) discretization of the underlying ODE.
We propose new extensions of the Adam scheme obtained by using higher-order IMEX methods to solve the ODE.
We derive a new optimization algorithm for neural network training that performs better than classical Adam on several regression and classification problems.
arXiv Detail & Related papers (2024-03-20T16:08:27Z) - Ensemble Quadratic Assignment Network for Graph Matching [52.20001802006391]
Graph matching is a commonly used technique in computer vision and pattern recognition.
Recent data-driven approaches have improved the graph matching accuracy remarkably.
We propose a graph neural network (GNN) based approach to combine the advantages of data-driven and traditional methods.
arXiv Detail & Related papers (2024-03-11T06:34:05Z) - MADA: Meta-Adaptive Optimizers through hyper-gradient Descent [73.1383658672682]
We introduce Meta-Adaptives (MADA), a unified framework that can generalize several known convergences and dynamically learn the most suitable one during training.
We empirically compare MADA to other populars on vision and language tasks, and find that MADA consistently outperforms Adam and other populars.
We also propose AVGrad, a modification of AMS that replaces the maximum operator with averaging, which is more suitable for hyper-gradient optimization.
arXiv Detail & Related papers (2024-01-17T00:16:46Z) - Bidirectional Looking with A Novel Double Exponential Moving Average to
Adaptive and Non-adaptive Momentum Optimizers [109.52244418498974]
We propose a novel textscAdmeta (textbfADouble exponential textbfMov averagtextbfE textbfAdaptive and non-adaptive momentum) framework.
We provide two implementations, textscAdmetaR and textscAdmetaS, the former based on RAdam and the latter based on SGDM.
arXiv Detail & Related papers (2023-07-02T18:16:06Z) - Understanding the Generalization of Adam in Learning Neural Networks
with Proper Regularization [118.50301177912381]
We show that Adam can converge to different solutions of the objective with provably different errors, even with weight decay globalization.
We show that if convex, and the weight decay regularization is employed, any optimization algorithms including Adam will converge to the same solution.
arXiv Detail & Related papers (2021-08-25T17:58:21Z) - Human Body Model Fitting by Learned Gradient Descent [48.79414884222403]
We propose a novel algorithm for the fitting of 3D human shape to images.
We show that this algorithm is fast (avg. 120ms convergence), robust to dataset, and achieves state-of-the-art results on public evaluation datasets.
arXiv Detail & Related papers (2020-08-19T14:26:47Z) - ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning [91.13797346047984]
We introduce ADAHESSIAN, a second order optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates.
We show that ADAHESSIAN achieves new state-of-the-art results by a large margin as compared to other adaptive optimization methods.
arXiv Detail & Related papers (2020-06-01T05:00:51Z) - TAdam: A Robust Stochastic Gradient Optimizer [6.973803123972298]
Machine learning algorithms aim to find patterns from observations, which may include some noise, especially in robotics domain.
To perform well even with such noise, we expect them to be able to detect outliers and discard them when needed.
We propose a new gradient optimization method, whose robustness is directly built in the algorithm, using the robust student-t distribution as its core idea.
arXiv Detail & Related papers (2020-02-29T04:32:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.