Related papers: An Energy-Based Self-Adaptive Learning Rate for Stochastic Gradient Descent: Enhancing Unconstrained Optimization with VAV method

An Energy-Based Self-Adaptive Learning Rate for Stochastic Gradient Descent: Enhancing Unconstrained Optimization with VAV method

URL: http://arxiv.org/abs/2411.06573v1
Date: Sun, 10 Nov 2024 19:39:40 GMT
Title: An Energy-Based Self-Adaptive Learning Rate for Stochastic Gradient Descent: Enhancing Unconstrained Optimization with VAV method
Authors: Jiahao Zhang, Christian Moya, Guang Lin,
Abstract summary: This paper introduces a novel energy-based self-adjustable learning rate optimization method designed for unconstrained optimization problems. It incorporates an auxiliary variable $r to facilitate efficient energy approximation without backtracking while adhering to the unconditional energy dissipation law. Notably, VAV demonstrates superior stability with larger learning rates and achieves faster convergence in the early stage of the training process.
Score: 9.298950359150092
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Optimizing the learning rate remains a critical challenge in machine learning, essential for achieving model stability and efficient convergence. The Vector Auxiliary Variable (VAV) algorithm introduces a novel energy-based self-adjustable learning rate optimization method designed for unconstrained optimization problems. It incorporates an auxiliary variable $r$ to facilitate efficient energy approximation without backtracking while adhering to the unconditional energy dissipation law. Notably, VAV demonstrates superior stability with larger learning rates and achieves faster convergence in the early stage of the training process. Comparative analyses demonstrate that VAV outperforms Stochastic Gradient Descent (SGD) across various tasks. This paper also provides rigorous proof of the energy dissipation law and establishes the convergence of the algorithm under reasonable assumptions. Additionally, $r$ acts as an empirical lower bound of the training loss in practice, offering a novel scheduling approach that further enhances algorithm performance.

Related papers

A Trainable Optimizer [18.195022468462753]
We present a framework that jointly trains the full gradient estimator and the trainable weights of the model.<n>Pseudo-linear TO incurs negligible computational overhead, requiring only minimal additional multiplications.<n> Experiments demonstrate that TO methods converge faster than benchmark algorithms.
arXiv Detail & Related papers (2025-08-03T14:06:07Z)
Architect Your Landscape Approach (AYLA) for Optimizations in Deep Learning [0.0]
Gradient Descent (DSG) and its variants, such as ADAM, are foundational to deep learning optimization. This paper introduces AYLA, a novel optimization technique that enhances adaptability and efficiency rates.
arXiv Detail & Related papers (2025-04-02T16:31:39Z)
A Triple-Inertial Accelerated Alternating Optimization Method for Deep Learning Training [3.246129789918632]
gradient descent (SGD) algorithm has achieved remarkable success in training deep learning models. alternating minimization (AM) methods have emerged as a promising alternative for model training. We propose a novel Triple-Inertial Accelerated Alternating Minimization (TIAM) framework for neural network training.
arXiv Detail & Related papers (2025-03-11T14:42:17Z)
Preventing Local Pitfalls in Vector Quantization via Optimal Transport [77.15924044466976]
We introduce OptVQ, a novel vector quantization method that employs the Sinkhorn algorithm to optimize the optimal transport problem. Our experiments on image reconstruction tasks demonstrate that OptVQ achieves 100% codebook utilization and surpasses current state-of-the-art VQNs in reconstruction quality.
arXiv Detail & Related papers (2024-12-19T18:58:14Z)
Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives. We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z)
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization. A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR. For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z)
An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent [10.061799286306163]
We investigate the convergence behavior of the delta-bar-delta algorithm in real-world neural network optimization. To address any potential convergence challenges, we propose a novel approach called RDBD (Regrettable Delta-Bar-Delta) Our approach allows for prompt correction of biased learning rate adjustments and ensures the convergence of the optimization process.
arXiv Detail & Related papers (2023-10-17T14:15:57Z)
Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML. This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z)
An Element-wise RSAV Algorithm for Unconstrained Optimization Problems [13.975774245256561]
We present a novel optimization algorithm, element-wise relaxed scalar auxiliary variable (E-RSAV) Our algorithm features rigorous proofs of linear convergence in the convex setting. We also propose an adaptive version of E-RSAV with Steffensen step size.
arXiv Detail & Related papers (2023-09-07T20:37:23Z)
Enhanced Teaching-Learning-based Optimization for 3D Path Planning of Multicopter UAVs [2.0305676256390934]
This paper introduces a new path planning algorithm for unmanned aerial vehicles (UAVs) based on the teaching-learning-based optimization technique. We first define an objective function that incorporates requirements on the path length and constraints on the movement and safe operation of UAVs. The algorithm named Multi-subject TLBO is then proposed to minimize the formulated objective function.
arXiv Detail & Related papers (2022-05-31T16:00:32Z)
An Adaptive Gradient Method with Energy and Momentum [0.0]
We introduce a novel algorithm for gradient-based optimization of objective functions. The method is simple to implement, computationally efficient, and well suited for large-scale machine learning problems.
arXiv Detail & Related papers (2022-03-23T04:48:38Z)
Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem) AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient. Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z)
Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems. We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z)
Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory. Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.