BFE and AdaBFE: A New Approach in Learning Rate Automation for
Stochastic Optimization
- URL: http://arxiv.org/abs/2207.02763v1
- Date: Wed, 6 Jul 2022 15:55:53 GMT
- Title: BFE and AdaBFE: A New Approach in Learning Rate Automation for
Stochastic Optimization
- Authors: Xin Cao
- Abstract summary: gradient-based optimization approach by automatically adjusting the learning rate is proposed.
This approach could be an alternative method to optimize the learning rate based on the gradient descent (SGD) algorithm.
- Score: 3.541406632811038
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, a new gradient-based optimization approach by automatically
adjusting the learning rate is proposed. This approach can be applied to design
non-adaptive learning rate and adaptive learning rate. Firstly, I will
introduce the non-adaptive learning rate optimization method: Binary Forward
Exploration (BFE), and then the corresponding adaptive per-parameter learning
rate method: Adaptive BFE (AdaBFE) is possible to be developed. This approach
could be an alternative method to optimize the learning rate based on the
stochastic gradient descent (SGD) algorithm besides the current non-adaptive
learning rate methods e.g. SGD, momentum, Nesterov and the adaptive learning
rate methods e.g. AdaGrad, AdaDelta, Adam... The purpose to develop this
approach is not to beat the benchmark of other methods but just to provide a
different perspective to optimize the gradient descent method, although some
comparative study with previous methods will be made in the following sections.
This approach is expected to be heuristic or inspire researchers to improve
gradient-based optimization combined with previous methods.
Related papers
- Learning rate adaptive stochastic gradient descent optimization methods: numerical simulations for deep learning methods for partial differential equations and convergence analyses [5.052293146674794]
It is known that the standard descent (SGD) optimization method, as well as accelerated and adaptive SGD optimization methods such as the Adam fail to converge if the learning rates do not converge to zero.
In this work we propose and study a learning-rate-adaptive approach for SGD optimization methods in which the learning rate is adjusted based on empirical estimates.
arXiv Detail & Related papers (2024-06-20T14:07:39Z) - Interpreting Adaptive Gradient Methods by Parameter Scaling for
Learning-Rate-Free Optimization [14.009179786857802]
We address the challenge of estimating the learning rate for adaptive gradient methods used in training deep neural networks.
While several learning-rate-free approaches have been proposed, they are typically tailored for steepest descent.
In this paper, we interpret adaptive gradient methods as steepest descent applied on parameter-scaled networks.
arXiv Detail & Related papers (2024-01-06T15:45:29Z) - Accelerated Federated Learning with Decoupled Adaptive Optimization [53.230515878096426]
federated learning (FL) framework enables clients to collaboratively learn a shared model while keeping privacy of training data on clients.
Recently, many iterations efforts have been made to generalize centralized adaptive optimization methods, such as SGDM, Adam, AdaGrad, etc., to federated settings.
This work aims to develop novel adaptive optimization methods for FL from the perspective of dynamics of ordinary differential equations (ODEs)
arXiv Detail & Related papers (2022-07-14T22:46:43Z) - Improved Binary Forward Exploration: Learning Rate Scheduling Method for
Stochastic Optimization [3.541406632811038]
A new gradient-based optimization approach by automatically scheduling the learning rate has been proposed recently, which is called Binary Forward Exploration (BFE)
In this paper, the improved algorithms based on them will be investigated, in order to optimize the efficiency and robustness of the new methodology.
The goal of this method does not aim to beat others but provide a different viewpoint to optimize the gradient descent process.
arXiv Detail & Related papers (2022-07-09T05:28:44Z) - Adaptive Gradient Methods with Local Guarantees [48.980206926987606]
We propose an adaptive gradient method that has provable adaptive regret guarantees vs. the best local preconditioner.
We demonstrate the robustness of our method in automatically choosing the optimal learning rate schedule for popular benchmarking tasks in vision and language domains.
arXiv Detail & Related papers (2022-03-02T20:45:14Z) - Adaptive Differentially Private Empirical Risk Minimization [95.04948014513226]
We propose an adaptive (stochastic) gradient perturbation method for differentially private empirical risk minimization.
We prove that the ADP method considerably improves the utility guarantee compared to the standard differentially private method in which vanilla random noise is added.
arXiv Detail & Related papers (2021-10-14T15:02:20Z) - SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients [99.13839450032408]
It is desired to design a universal framework for adaptive algorithms to solve general problems.
In particular, our novel framework provides adaptive methods under non convergence support for setting.
arXiv Detail & Related papers (2021-06-15T15:16:28Z) - Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate
in Gradient Descent [20.47598828422897]
We propose textit-Meta-Regularization, a novel approach for the adaptive choice of the learning rate in first-order descent methods.
Our approach modifies the objective function by adding a regularization term, and casts the joint process parameters.
arXiv Detail & Related papers (2021-04-12T13:13:34Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem)
AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient.
Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z) - Adaptive Stochastic Optimization [1.7945141391585486]
Adaptive optimization methods have the potential to offer significant computational savings when training large-scale systems.
Modern approaches based on the gradient method are non-adaptive in the sense that their implementation employs prescribed parameter values that need to be tuned for each application.
arXiv Detail & Related papers (2020-01-18T16:30:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.