Related papers: Improved Binary Forward Exploration: Learning Rate Scheduling Method for Stochastic Optimization

Improved Binary Forward Exploration: Learning Rate Scheduling Method for Stochastic Optimization

URL: http://arxiv.org/abs/2207.04198v1
Date: Sat, 9 Jul 2022 05:28:44 GMT
Title: Improved Binary Forward Exploration: Learning Rate Scheduling Method for Stochastic Optimization
Authors: Xin Cao
Abstract summary: A new gradient-based optimization approach by automatically scheduling the learning rate has been proposed recently, which is called Binary Forward Exploration (BFE) In this paper, the improved algorithms based on them will be investigated, in order to optimize the efficiency and robustness of the new methodology. The goal of this method does not aim to beat others but provide a different viewpoint to optimize the gradient descent process.
Score: 3.541406632811038
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A new gradient-based optimization approach by automatically scheduling the learning rate has been proposed recently, which is called Binary Forward Exploration (BFE). The Adaptive version of BFE has also been discussed thereafter. In this paper, the improved algorithms based on them will be investigated, in order to optimize the efficiency and robustness of the new methodology. This improved approach provides a new perspective to scheduling the update of learning rate and will be compared with the stochastic gradient descent (SGD) algorithm with momentum or Nesterov momentum and the most successful adaptive learning rate algorithm e.g. Adam. The goal of this method does not aim to beat others but provide a different viewpoint to optimize the gradient descent process. This approach combines the advantages of the first-order and second-order optimizations in the aspects of speed and efficiency.

Related papers

An Improved Dung Beetle Optimizer for Random Forest Optimization [5.609805090828983]
This paper proposes an improved algorithm based on circle mapping and longitudinal-horizontal crossover strategy (CICRDBO) The improved algorithm performs well in both convergence speed and optimization accuracy.
arXiv Detail & Related papers (2024-11-24T06:48:55Z)
Minimizing UCB: a Better Local Search Strategy in Local Bayesian Optimization [9.120912236055544]
We develop the relationship between the steps of the gradient descent method and one that minimizes the Upper Confidence Bound (UCB) We propose a new local Bayesian optimization algorithm, MinUCB, which replaces the gradient descent step with minimizing UCB in GIBO. We apply our algorithms on different synthetic and real-world functions, and the results show the effectiveness of our method.
arXiv Detail & Related papers (2024-05-24T07:17:24Z)
Variational Stochastic Gradient Descent for Deep Neural Networks [16.96187187108041]
Current state-of-the-arts are adaptive gradient-based optimization methods such as Adam. Here, we propose to combine both approaches, resulting in the Variational Gradient Descent (VSGD) We show how our VSGD method relates to other adaptive gradient-baseds like Adam.
arXiv Detail & Related papers (2024-04-09T18:02:01Z)
Unleashing the Potential of Large Language Models as Prompt Optimizers: An Analogical Analysis with Gradient-based Model Optimizers [108.72225067368592]
We propose a novel perspective to investigate the design of large language models (LLMs)-based prompts. We identify two pivotal factors in model parameter learning: update direction and update method. In particular, we borrow the theoretical framework and learning methods from gradient-based optimization to design improved strategies.
arXiv Detail & Related papers (2024-02-27T15:05:32Z)
An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent [10.061799286306163]
We investigate the convergence behavior of the delta-bar-delta algorithm in real-world neural network optimization. To address any potential convergence challenges, we propose a novel approach called RDBD (Regrettable Delta-Bar-Delta) Our approach allows for prompt correction of biased learning rate adjustments and ensures the convergence of the optimization process.
arXiv Detail & Related papers (2023-10-17T14:15:57Z)
ELRA: Exponential learning rate adaption gradient descent optimization method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption. The main idea of the method is to adapt the $alpha by situational awareness. It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z)
Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z)
BFE and AdaBFE: A New Approach in Learning Rate Automation for Stochastic Optimization [3.541406632811038]
gradient-based optimization approach by automatically adjusting the learning rate is proposed. This approach could be an alternative method to optimize the learning rate based on the gradient descent (SGD) algorithm.
arXiv Detail & Related papers (2022-07-06T15:55:53Z)
Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks. We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution. We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z)
Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent [20.47598828422897]
We propose textit-Meta-Regularization, a novel approach for the adaptive choice of the learning rate in first-order descent methods. Our approach modifies the objective function by adding a regularization term, and casts the joint process parameters.
arXiv Detail & Related papers (2021-04-12T13:13:34Z)
Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling. Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z)
Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem) AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient. Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.