Improving Gradient-Trend Identification: Fast-Adaptive Moment Estimation
with Finance-Inspired Triple Exponential Moving Average
- URL: http://arxiv.org/abs/2306.01423v2
- Date: Thu, 21 Dec 2023 08:39:17 GMT
- Title: Improving Gradient-Trend Identification: Fast-Adaptive Moment Estimation
with Finance-Inspired Triple Exponential Moving Average
- Authors: Roi Peleg, Teddy Lazebnik, Assaf Hoogi
- Abstract summary: We introduce a novel called fast-adaptive moment estimation (FAME)
Inspired by the triple exponential moving average (TEMA) used in the financial domain, FAME improves the precision of identifying gradient trends.
Because of the introduction of TEMA into the optimization process, FAME can identify trends with higher accuracy and fewer lag issues.
- Score: 2.480023305418
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The performance improvement of deep networks significantly depends on their
optimizers. With existing optimizers, precise and efficient recognition of the
gradients trend remains a challenge. Existing optimizers predominantly adopt
techniques based on the first-order exponential moving average (EMA), which
results in noticeable delays that impede the real-time tracking of gradients
trend and consequently yield sub-optimal performance. To overcome this
limitation, we introduce a novel optimizer called fast-adaptive moment
estimation (FAME). Inspired by the triple exponential moving average (TEMA)
used in the financial domain, FAME leverages the potency of higher-order TEMA
to improve the precision of identifying gradient trends. TEMA plays a central
role in the learning process as it actively influences optimization dynamics;
this role differs from its conventional passive role as a technical indicator
in financial contexts. Because of the introduction of TEMA into the
optimization process, FAME can identify gradient trends with higher accuracy
and fewer lag issues, thereby offering smoother and more consistent responses
to gradient fluctuations compared to conventional first-order EMA. To study the
effectiveness of our novel FAME optimizer, we conducted comprehensive
experiments encompassing six diverse computer-vision benchmarks and tasks,
spanning detection, classification, and semantic comprehension. We integrated
FAME into 15 learning architectures and compared its performance with those of
six popular optimizers. Results clearly showed that FAME is more robust and
accurate and provides superior performance stability by minimizing noise (i.e.,
trend fluctuations). Notably, FAME achieves higher accuracy levels in
remarkably fewer training epochs than its counterparts, clearly indicating its
significance for optimizing deep networks in computer-vision tasks.
Related papers
- Understanding Optimization in Deep Learning with Central Flows [53.66160508990508]
We show that an RMS's implicit behavior can be explicitly captured by a "central flow:" a differential equation.
We show that these flows can empirically predict long-term optimization trajectories of generic neural networks.
arXiv Detail & Related papers (2024-10-31T17:58:13Z) - Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives.
We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z) - Adaptive Friction in Deep Learning: Enhancing Optimizers with Sigmoid and Tanh Function [0.0]
We introduce sigSignGrad and tanhSignGrad, two novel gradients that integrate adaptive friction coefficients.
Our theoretical analysis demonstrates the wide-ranging adjustment capability of the friction coefficient S.
Experiments on CIFAR-10, Mini-Image-Net using ResNet50 and ViT architectures confirm the superior performance our proposeds.
arXiv Detail & Related papers (2024-08-07T03:20:46Z) - FADAS: Towards Federated Adaptive Asynchronous Optimization [56.09666452175333]
Federated learning (FL) has emerged as a widely adopted training paradigm for privacy-preserving machine learning.
This paper introduces federated adaptive asynchronous optimization, named FADAS, a novel method that incorporates asynchronous updates into adaptive federated optimization with provable guarantees.
We rigorously establish the convergence rate of the proposed algorithms and empirical results demonstrate the superior performance of FADAS over other asynchronous FL baselines.
arXiv Detail & Related papers (2024-07-25T20:02:57Z) - Variational Stochastic Gradient Descent for Deep Neural Networks [16.96187187108041]
Current state-of-the-arts are adaptive gradient-based optimization methods such as Adam.
Here, we propose to combine both approaches, resulting in the Variational Gradient Descent (VSGD)
We show how our VSGD method relates to other adaptive gradient-baseds like Adam.
arXiv Detail & Related papers (2024-04-09T18:02:01Z) - Online Adaptive Disparity Estimation for Dynamic Scenes in Structured
Light Systems [17.53719804060679]
Self-supervised online adaptation has been proposed as a solution to bridge this performance gap.
We propose an unsupervised loss function based on long sequential inputs. It ensures better gradient directions and faster convergence.
Our proposed framework significantly improves the online adaptation speed and achieves superior performance on unseen data.
arXiv Detail & Related papers (2023-10-13T08:00:33Z) - Improving Multi-fidelity Optimization with a Recurring Learning Rate for
Hyperparameter Tuning [7.591442522626255]
We propose Multi-fidelity Optimization with a Recurring Learning rate (MORL)
MORL incorporates CNNs' optimization process into multi-fidelity optimization.
It alleviates the problem of slow-starter and achieves a more precise low-fidelity approximation.
arXiv Detail & Related papers (2022-09-26T08:16:31Z) - Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem)
AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient.
Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications.
In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training.
Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.