Improved Binary Forward Exploration: Learning Rate Scheduling Method for
Stochastic Optimization
- URL: http://arxiv.org/abs/2207.04198v1
- Date: Sat, 9 Jul 2022 05:28:44 GMT
- Title: Improved Binary Forward Exploration: Learning Rate Scheduling Method for
Stochastic Optimization
- Authors: Xin Cao
- Abstract summary: A new gradient-based optimization approach by automatically scheduling the learning rate has been proposed recently, which is called Binary Forward Exploration (BFE)
In this paper, the improved algorithms based on them will be investigated, in order to optimize the efficiency and robustness of the new methodology.
The goal of this method does not aim to beat others but provide a different viewpoint to optimize the gradient descent process.
- Score: 3.541406632811038
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A new gradient-based optimization approach by automatically scheduling the
learning rate has been proposed recently, which is called Binary Forward
Exploration (BFE). The Adaptive version of BFE has also been discussed
thereafter. In this paper, the improved algorithms based on them will be
investigated, in order to optimize the efficiency and robustness of the new
methodology. This improved approach provides a new perspective to scheduling
the update of learning rate and will be compared with the stochastic gradient
descent (SGD) algorithm with momentum or Nesterov momentum and the most
successful adaptive learning rate algorithm e.g. Adam. The goal of this method
does not aim to beat others but provide a different viewpoint to optimize the
gradient descent process. This approach combines the advantages of the
first-order and second-order optimizations in the aspects of speed and
efficiency.
Related papers
- Minimizing UCB: a Better Local Search Strategy in Local Bayesian Optimization [9.120912236055544]
We develop the relationship between the steps of the gradient descent method and one that minimizes the Upper Confidence Bound (UCB)
We propose a new local Bayesian optimization algorithm, MinUCB, which replaces the gradient descent step with minimizing UCB in GIBO.
We apply our algorithms on different synthetic and real-world functions, and the results show the effectiveness of our method.
arXiv Detail & Related papers (2024-05-24T07:17:24Z) - Variational Stochastic Gradient Descent for Deep Neural Networks [16.96187187108041]
Current state-of-the-arts are adaptive gradient-based optimization methods such as Adam.
Here, we propose to combine both approaches, resulting in the Variational Gradient Descent (VSGD)
We show how our VSGD method relates to other adaptive gradient-baseds like Adam.
arXiv Detail & Related papers (2024-04-09T18:02:01Z) - Unleashing the Potential of Large Language Models as Prompt Optimizers: An Analogical Analysis with Gradient-based Model Optimizers [108.72225067368592]
We propose a novel perspective to investigate the design of large language models (LLMs)-based prompts.
We identify two pivotal factors in model parameter learning: update direction and update method.
In particular, we borrow the theoretical framework and learning methods from gradient-based optimization to design improved strategies.
arXiv Detail & Related papers (2024-02-27T15:05:32Z) - An Automatic Learning Rate Schedule Algorithm for Achieving Faster
Convergence and Steeper Descent [10.061799286306163]
We investigate the convergence behavior of the delta-bar-delta algorithm in real-world neural network optimization.
To address any potential convergence challenges, we propose a novel approach called RDBD (Regrettable Delta-Bar-Delta)
Our approach allows for prompt correction of biased learning rate adjustments and ensures the convergence of the optimization process.
arXiv Detail & Related papers (2023-10-17T14:15:57Z) - ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - BFE and AdaBFE: A New Approach in Learning Rate Automation for
Stochastic Optimization [3.541406632811038]
gradient-based optimization approach by automatically adjusting the learning rate is proposed.
This approach could be an alternative method to optimize the learning rate based on the gradient descent (SGD) algorithm.
arXiv Detail & Related papers (2022-07-06T15:55:53Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate
in Gradient Descent [20.47598828422897]
We propose textit-Meta-Regularization, a novel approach for the adaptive choice of the learning rate in first-order descent methods.
Our approach modifies the objective function by adding a regularization term, and casts the joint process parameters.
arXiv Detail & Related papers (2021-04-12T13:13:34Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem)
AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient.
Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.