Automatic, Dynamic, and Nearly Optimal Learning Rate Specification by
Local Quadratic Approximation
- URL: http://arxiv.org/abs/2004.03260v1
- Date: Tue, 7 Apr 2020 10:55:12 GMT
- Title: Automatic, Dynamic, and Nearly Optimal Learning Rate Specification by
Local Quadratic Approximation
- Authors: Yingqiu Zhu, Yu Chen, Danyang Huang, Bo Zhang and Hansheng Wang
- Abstract summary: In deep learning tasks, the learning rate determines the update step size in each iteration.
We propose a novel optimization method based on local quadratic approximation (LQA)
- Score: 7.386152866234369
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In deep learning tasks, the learning rate determines the update step size in
each iteration, which plays a critical role in gradient-based optimization.
However, the determination of the appropriate learning rate in practice
typically replies on subjective judgement. In this work, we propose a novel
optimization method based on local quadratic approximation (LQA). In each
update step, given the gradient direction, we locally approximate the loss
function by a standard quadratic function of the learning rate. Then, we
propose an approximation step to obtain a nearly optimal learning rate in a
computationally efficient way. The proposed LQA method has three important
features. First, the learning rate is automatically determined in each update
step. Second, it is dynamically adjusted according to the current loss function
value and the parameter estimates. Third, with the gradient direction fixed,
the proposed method leads to nearly the greatest reduction in terms of the loss
function. Extensive experiments have been conducted to prove the strengths of
the proposed LQA method.
Related papers
- Learning rate adaptive stochastic gradient descent optimization methods: numerical simulations for deep learning methods for partial differential equations and convergence analyses [5.052293146674794]
It is known that the standard descent (SGD) optimization method, as well as accelerated and adaptive SGD optimization methods such as the Adam fail to converge if the learning rates do not converge to zero.
In this work we propose and study a learning-rate-adaptive approach for SGD optimization methods in which the learning rate is adjusted based on empirical estimates.
arXiv Detail & Related papers (2024-06-20T14:07:39Z) - ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$.
Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z) - Continuous-Time Meta-Learning with Forward Mode Differentiation [65.26189016950343]
We introduce Continuous Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.
Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous.
We show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.
arXiv Detail & Related papers (2022-03-02T22:35:58Z) - Towards a Principled Learning Rate Adaptation for Natural Evolution
Strategies [0.0]
We propose a new learning rate adaptation mechanism for Natural Evolution Strategies (NES)
The proposed mechanism makes it possible to set a high learning rate for problems that are relatively easy to optimize.
The experimental evaluations on unimodal and multimodal functions demonstrate that the proposed mechanism works properly depending on a search situation.
arXiv Detail & Related papers (2021-11-22T13:20:12Z) - Average-Reward Off-Policy Policy Evaluation with Function Approximation [66.67075551933438]
We consider off-policy policy evaluation with function approximation in average-reward MDPs.
bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad.
We propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting.
arXiv Detail & Related papers (2021-01-08T00:43:04Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - An adaptive stochastic gradient-free approach for high-dimensional
blackbox optimization [0.0]
We propose an adaptive gradient-free (ASGF) approach for high-dimensional non-smoothing problems.
We illustrate the performance of this method on benchmark global problems and learning tasks.
arXiv Detail & Related papers (2020-06-18T22:47:58Z) - A Primer on Zeroth-Order Optimization in Signal Processing and Machine
Learning [95.85269649177336]
ZO optimization iteratively performs three major steps: gradient estimation, descent direction, and solution update.
We demonstrate promising applications of ZO optimization, such as evaluating and generating explanations from black-box deep learning models, and efficient online sensor management.
arXiv Detail & Related papers (2020-06-11T06:50:35Z) - Learning to be Global Optimizer [28.88646928299302]
We learn an optimal network and escaping capability algorithm for some benchmark functions.
We show that the learned algorithm significantly outperforms some well-known classical optimization algorithms.
arXiv Detail & Related papers (2020-03-10T03:46:25Z) - Statistical Adaptive Stochastic Gradient Methods [34.859895010071234]
We propose a statistical adaptive procedure called SALSA for automatically scheduling the learning rate (step size) in gradient methods.
SALSA first uses a smoothed line-search procedure to gradually increase the learning rate, then automatically decreases the learning rate.
The method for decreasing the learning rate is based on a new statistical test for detecting station switches when using a constant step size.
arXiv Detail & Related papers (2020-02-25T00:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.