Related papers: BFE and AdaBFE: A New Approach in Learning Rate Automation for Stochastic Optimization

BFE and AdaBFE: A New Approach in Learning Rate Automation for Stochastic Optimization

URL: http://arxiv.org/abs/2207.02763v1
Date: Wed, 6 Jul 2022 15:55:53 GMT
Title: BFE and AdaBFE: A New Approach in Learning Rate Automation for Stochastic Optimization
Authors: Xin Cao
Abstract summary: gradient-based optimization approach by automatically adjusting the learning rate is proposed. This approach could be an alternative method to optimize the learning rate based on the gradient descent (SGD) algorithm.
Score: 3.541406632811038
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, a new gradient-based optimization approach by automatically adjusting the learning rate is proposed. This approach can be applied to design non-adaptive learning rate and adaptive learning rate. Firstly, I will introduce the non-adaptive learning rate optimization method: Binary Forward Exploration (BFE), and then the corresponding adaptive per-parameter learning rate method: Adaptive BFE (AdaBFE) is possible to be developed. This approach could be an alternative method to optimize the learning rate based on the stochastic gradient descent (SGD) algorithm besides the current non-adaptive learning rate methods e.g. SGD, momentum, Nesterov and the adaptive learning rate methods e.g. AdaGrad, AdaDelta, Adam... The purpose to develop this approach is not to beat the benchmark of other methods but just to provide a different perspective to optimize the gradient descent method, although some comparative study with previous methods will be made in the following sections. This approach is expected to be heuristic or inspire researchers to improve gradient-based optimization combined with previous methods.

Related papers

Learning rate adaptive stochastic gradient descent optimization methods: numerical simulations for deep learning methods for partial differential equations and convergence analyses [5.052293146674794]
It is known that the standard descent (SGD) optimization method, as well as accelerated and adaptive SGD optimization methods such as the Adam fail to converge if the learning rates do not converge to zero. In this work we propose and study a learning-rate-adaptive approach for SGD optimization methods in which the learning rate is adjusted based on empirical estimates.
arXiv Detail & Related papers (2024-06-20T14:07:39Z)
Interpreting Adaptive Gradient Methods by Parameter Scaling for Learning-Rate-Free Optimization [14.009179786857802]
We address the challenge of estimating the learning rate for adaptive gradient methods used in training deep neural networks. While several learning-rate-free approaches have been proposed, they are typically tailored for steepest descent. In this paper, we interpret adaptive gradient methods as steepest descent applied on parameter-scaled networks.
arXiv Detail & Related papers (2024-01-06T15:45:29Z)
Accelerated Federated Learning with Decoupled Adaptive Optimization [53.230515878096426]
federated learning (FL) framework enables clients to collaboratively learn a shared model while keeping privacy of training data on clients. Recently, many iterations efforts have been made to generalize centralized adaptive optimization methods, such as SGDM, Adam, AdaGrad, etc., to federated settings. This work aims to develop novel adaptive optimization methods for FL from the perspective of dynamics of ordinary differential equations (ODEs)
arXiv Detail & Related papers (2022-07-14T22:46:43Z)
Improved Binary Forward Exploration: Learning Rate Scheduling Method for Stochastic Optimization [3.541406632811038]
A new gradient-based optimization approach by automatically scheduling the learning rate has been proposed recently, which is called Binary Forward Exploration (BFE) In this paper, the improved algorithms based on them will be investigated, in order to optimize the efficiency and robustness of the new methodology. The goal of this method does not aim to beat others but provide a different viewpoint to optimize the gradient descent process.
arXiv Detail & Related papers (2022-07-09T05:28:44Z)
Adaptive Gradient Methods with Local Guarantees [48.980206926987606]
We propose an adaptive gradient method that has provable adaptive regret guarantees vs. the best local preconditioner. We demonstrate the robustness of our method in automatically choosing the optimal learning rate schedule for popular benchmarking tasks in vision and language domains.
arXiv Detail & Related papers (2022-03-02T20:45:14Z)
Adaptive Differentially Private Empirical Risk Minimization [95.04948014513226]
We propose an adaptive (stochastic) gradient perturbation method for differentially private empirical risk minimization. We prove that the ADP method considerably improves the utility guarantee compared to the standard differentially private method in which vanilla random noise is added.
arXiv Detail & Related papers (2021-10-14T15:02:20Z)
SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients [99.13839450032408]
It is desired to design a universal framework for adaptive algorithms to solve general problems. In particular, our novel framework provides adaptive methods under non convergence support for setting.
arXiv Detail & Related papers (2021-06-15T15:16:28Z)
Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent [20.47598828422897]
We propose textit-Meta-Regularization, a novel approach for the adaptive choice of the learning rate in first-order descent methods. Our approach modifies the objective function by adding a regularization term, and casts the joint process parameters.
arXiv Detail & Related papers (2021-04-12T13:13:34Z)
Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling. Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z)
Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem) AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient. Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z)
Adaptive Stochastic Optimization [1.7945141391585486]
Adaptive optimization methods have the potential to offer significant computational savings when training large-scale systems. Modern approaches based on the gradient method are non-adaptive in the sense that their implementation employs prescribed parameter values that need to be tuned for each application.
arXiv Detail & Related papers (2020-01-18T16:30:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.