Related papers: Improving Gradient-Trend Identification: Fast-Adaptive Moment Estimation with Finance-Inspired Triple Exponential Moving Average

Improving Gradient-Trend Identification: Fast-Adaptive Moment Estimation with Finance-Inspired Triple Exponential Moving Average

URL: http://arxiv.org/abs/2306.01423v2
Date: Thu, 21 Dec 2023 08:39:17 GMT
Title: Improving Gradient-Trend Identification: Fast-Adaptive Moment Estimation with Finance-Inspired Triple Exponential Moving Average
Authors: Roi Peleg, Teddy Lazebnik, Assaf Hoogi
Abstract summary: We introduce a novel called fast-adaptive moment estimation (FAME) Inspired by the triple exponential moving average (TEMA) used in the financial domain, FAME improves the precision of identifying gradient trends. Because of the introduction of TEMA into the optimization process, FAME can identify trends with higher accuracy and fewer lag issues.
Score: 2.480023305418
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The performance improvement of deep networks significantly depends on their optimizers. With existing optimizers, precise and efficient recognition of the gradients trend remains a challenge. Existing optimizers predominantly adopt techniques based on the first-order exponential moving average (EMA), which results in noticeable delays that impede the real-time tracking of gradients trend and consequently yield sub-optimal performance. To overcome this limitation, we introduce a novel optimizer called fast-adaptive moment estimation (FAME). Inspired by the triple exponential moving average (TEMA) used in the financial domain, FAME leverages the potency of higher-order TEMA to improve the precision of identifying gradient trends. TEMA plays a central role in the learning process as it actively influences optimization dynamics; this role differs from its conventional passive role as a technical indicator in financial contexts. Because of the introduction of TEMA into the optimization process, FAME can identify gradient trends with higher accuracy and fewer lag issues, thereby offering smoother and more consistent responses to gradient fluctuations compared to conventional first-order EMA. To study the effectiveness of our novel FAME optimizer, we conducted comprehensive experiments encompassing six diverse computer-vision benchmarks and tasks, spanning detection, classification, and semantic comprehension. We integrated FAME into 15 learning architectures and compared its performance with those of six popular optimizers. Results clearly showed that FAME is more robust and accurate and provides superior performance stability by minimizing noise (i.e., trend fluctuations). Notably, FAME achieves higher accuracy levels in remarkably fewer training epochs than its counterparts, clearly indicating its significance for optimizing deep networks in computer-vision tasks.

Related papers

Architect Your Landscape Approach (AYLA) for Optimizations in Deep Learning [0.0]
Gradient Descent (DSG) and its variants, such as ADAM, are foundational to deep learning optimization. This paper introduces AYLA, a novel optimization technique that enhances adaptability and efficiency rates.
arXiv Detail & Related papers (2025-04-02T16:31:39Z)
Optimal Transport Adapter Tuning for Bridging Modality Gaps in Few-Shot Remote Sensing Scene Classification [80.83325513157637]
Few-Shot Remote Sensing Scene Classification (FS-RSSC) presents the challenge of classifying remote sensing images with limited labeled samples. We propose a novel Optimal Transport Adapter Tuning (OTAT) framework aimed at constructing an ideal Platonic representational space.
arXiv Detail & Related papers (2025-03-19T07:04:24Z)
Understanding Optimization in Deep Learning with Central Flows [53.66160508990508]
We show that an RMS's implicit behavior can be explicitly captured by a "central flow:" a differential equation. We show that these flows can empirically predict long-term optimization trajectories of generic neural networks.
arXiv Detail & Related papers (2024-10-31T17:58:13Z)
Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives. We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z)
Improving Instance Optimization in Deformable Image Registration with Gradient Projection [7.6061804149819885]
Deformable image registration is inherently a multi-objective optimization problem. These conflicting objectives often lead to poor optimization outcomes. Deep learning methods have recently gained popularity in this domain due to their efficiency in processing large datasets.
arXiv Detail & Related papers (2024-10-21T08:27:13Z)
HGSLoc: 3DGS-based Heuristic Camera Pose Refinement [13.393035855468428]
Visual localization refers to the process of determining camera poses and orientation within a known scene representation. In this paper, we propose HGSLoc, which integrates 3D reconstruction with a refinement strategy to achieve higher pose estimation accuracy. Our method demonstrates a faster rendering speed and higher localization accuracy compared to NeRF-based neural rendering approaches.
arXiv Detail & Related papers (2024-09-17T06:48:48Z)
Adaptive Friction in Deep Learning: Enhancing Optimizers with Sigmoid and Tanh Function [0.0]
We introduce sigSignGrad and tanhSignGrad, two novel gradients that integrate adaptive friction coefficients. Our theoretical analysis demonstrates the wide-ranging adjustment capability of the friction coefficient S. Experiments on CIFAR-10, Mini-Image-Net using ResNet50 and ViT architectures confirm the superior performance our proposeds.
arXiv Detail & Related papers (2024-08-07T03:20:46Z)
FADAS: Towards Federated Adaptive Asynchronous Optimization [56.09666452175333]
Federated learning (FL) has emerged as a widely adopted training paradigm for privacy-preserving machine learning. This paper introduces federated adaptive asynchronous optimization, named FADAS, a novel method that incorporates asynchronous updates into adaptive federated optimization with provable guarantees. We rigorously establish the convergence rate of the proposed algorithms and empirical results demonstrate the superior performance of FADAS over other asynchronous FL baselines.
arXiv Detail & Related papers (2024-07-25T20:02:57Z)
Variational Stochastic Gradient Descent for Deep Neural Networks [16.96187187108041]
Current state-of-the-arts are adaptive gradient-based optimization methods such as Adam. Here, we propose to combine both approaches, resulting in the Variational Gradient Descent (VSGD) We show how our VSGD method relates to other adaptive gradient-baseds like Adam.
arXiv Detail & Related papers (2024-04-09T18:02:01Z)
Online Adaptive Disparity Estimation for Dynamic Scenes in Structured Light Systems [17.53719804060679]
Self-supervised online adaptation has been proposed as a solution to bridge this performance gap. We propose an unsupervised loss function based on long sequential inputs. It ensures better gradient directions and faster convergence. Our proposed framework significantly improves the online adaptation speed and achieves superior performance on unseen data.
arXiv Detail & Related papers (2023-10-13T08:00:33Z)
Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers [109.52244418498974]
We propose a novel textscAdmeta (textbfADouble exponential textbfMov averagtextbfE textbfAdaptive and non-adaptive momentum) framework. We provide two implementations, textscAdmetaR and textscAdmetaS, the former based on RAdam and the latter based on SGDM.
arXiv Detail & Related papers (2023-07-02T18:16:06Z)
Learning Large-scale Neural Fields via Context Pruned Meta-Learning [60.93679437452872]
We introduce an efficient optimization-based meta-learning technique for large-scale neural field training. We show how gradient re-scaling at meta-test time allows the learning of extremely high-quality neural fields. Our framework is model-agnostic, intuitive, straightforward to implement, and shows significant reconstruction improvements for a wide range of signals.
arXiv Detail & Related papers (2023-02-01T17:32:16Z)
Improving Multi-fidelity Optimization with a Recurring Learning Rate for Hyperparameter Tuning [7.591442522626255]
We propose Multi-fidelity Optimization with a Recurring Learning rate (MORL) MORL incorporates CNNs' optimization process into multi-fidelity optimization. It alleviates the problem of slow-starter and achieves a more precise low-fidelity approximation.
arXiv Detail & Related papers (2022-09-26T08:16:31Z)
Optimization-Inspired Learning with Architecture Augmentations and Control Mechanisms for Low-Level Vision [74.9260745577362]
This paper proposes a unified optimization-inspired learning framework to aggregate Generative, Discriminative, and Corrective (GDC) principles. We construct three propagative modules to effectively solve the optimization models with flexible combinations. Experiments across varied low-level vision tasks validate the efficacy and adaptability of GDC.
arXiv Detail & Related papers (2020-12-10T03:24:53Z)
Transferable Graph Optimizers for ML Compilers [18.353830282858834]
We propose an end-to-end, transferable deep reinforcement learning method for computational graph optimization (GO) GO generates decisions on the entire graph rather than on each individual node autoregressively, drastically speeding up the search compared to prior methods. GO achieves 21% improvement over human experts and 18% improvement over the prior state of the art with 15x faster convergence.
arXiv Detail & Related papers (2020-10-21T20:28:33Z)
Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem) AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient. Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications. In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training. Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.