Gravity Optimizer: a Kinematic Approach on Optimization in Deep Learning
- URL: http://arxiv.org/abs/2101.09192v1
- Date: Fri, 22 Jan 2021 16:27:34 GMT
- Title: Gravity Optimizer: a Kinematic Approach on Optimization in Deep Learning
- Authors: Dariush Bahrami, Sadegh Pouriyan Zadeh
- Abstract summary: We introduce Gravity, another algorithm for gradient-based optimization.
In this paper, we explain how our novel idea change parameters to reduce the deep learning model's loss.
Also, we propose an alternative to moving average.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Gravity, another algorithm for gradient-based optimization. In
this paper, we explain how our novel idea change parameters to reduce the deep
learning model's loss. It has three intuitive hyper-parameters that the best
values for them are proposed. Also, we propose an alternative to moving
average. To compare the performance of the Gravity optimizer with two common
optimizers, Adam and RMSProp, five standard datasets were trained on two VGGNet
models with a batch size of 128 for 100 epochs. Gravity hyper-parameters did
not need to be tuned for different models. As will be explained more in the
paper, to investigate the direct impact of the optimizer itself on loss
reduction no overfitting prevention technique was used. The obtained results
show that the Gravity optimizer has more stable performance than Adam and
RMSProp and gives greater values of validation accuracy for datasets with more
output classes like CIFAR-100 (Fine).
Related papers
- Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling [27.058009599819012]
We study the connection between optimal learning rates and batch sizes for Adam styles.
We prove that the optimal learning rate first rises and then falls as the batch size increases.
arXiv Detail & Related papers (2024-05-23T13:52:36Z) - Should I try multiple optimizers when fine-tuning pre-trained
Transformers for NLP tasks? Should I tune their hyperparameters? [14.349943044268471]
Gradient Descent (SGD) is employed to select neural models for training.
tuning just the learning rate is in most cases as good as tuning all the hyperparameters.
We recommend picking any of the best-behaved adaptiveBounds (e.g., Adam) and recommending its learning rate.
arXiv Detail & Related papers (2024-02-10T13:26:14Z) - MADA: Meta-Adaptive Optimizers through hyper-gradient Descent [73.1383658672682]
We introduce Meta-Adaptives (MADA), a unified framework that can generalize several known convergences and dynamically learn the most suitable one during training.
We empirically compare MADA to other populars on vision and language tasks, and find that MADA consistently outperforms Adam and other populars.
We also propose AVGrad, a modification of AMS that replaces the maximum operator with averaging, which is more suitable for hyper-gradient optimization.
arXiv Detail & Related papers (2024-01-17T00:16:46Z) - AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models.
AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z) - ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive.
We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation.
Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z) - XGrad: Boosting Gradient-Based Optimizers With Weight Prediction [20.068681423455057]
In this paper, we propose a general deep learning training framework XGrad.
XGrad introduces weight prediction into the popular gradient-based DNNs to boost their convergence and generalization.
The experimental results validate that XGrad can attain higher model accuracy than the baselines when training the models.
arXiv Detail & Related papers (2023-05-26T10:34:00Z) - VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles.
We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates.
We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z) - Curvature Injected Adaptive Momentum Optimizer for Convolutional Neural
Networks [21.205976369691765]
We propose a new approach, hereafter referred as AdaInject, for the descent gradients.
The curvature information is used as a weight to inject the second order moment in the update rule.
The AdaInject approach boosts the parameter update by exploiting the curvature information.
arXiv Detail & Related papers (2021-09-26T06:24:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.