Resetting the Optimizer in Deep RL: An Empirical Study
- URL: http://arxiv.org/abs/2306.17833v2
- Date: Wed, 15 Nov 2023 00:47:58 GMT
- Title: Resetting the Optimizer in Deep RL: An Empirical Study
- Authors: Kavosh Asadi, Rasool Fakoor, Shoham Sabach
- Abstract summary: We focus on the task of approximating the optimal value function in deep reinforcement learning.
We demonstrate that this simple modification significantly improves the performance of deep RL on the Atari benchmark.
- Score: 10.907980864371213
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We focus on the task of approximating the optimal value function in deep
reinforcement learning. This iterative process is comprised of solving a
sequence of optimization problems where the loss function changes per
iteration. The common approach to solving this sequence of problems is to
employ modern variants of the stochastic gradient descent algorithm such as
Adam. These optimizers maintain their own internal parameters such as estimates
of the first-order and the second-order moments of the gradient, and update
them over time. Therefore, information obtained in previous iterations is used
to solve the optimization problem in the current iteration. We demonstrate that
this can contaminate the moment estimates because the optimization landscape
can change arbitrarily from one iteration to the next one. To hedge against
this negative effect, a simple idea is to reset the internal parameters of the
optimizer when starting a new iteration. We empirically investigate this
resetting idea by employing various optimizers in conjunction with the Rainbow
algorithm. We demonstrate that this simple modification significantly improves
the performance of deep RL on the Atari benchmark.
Related papers
- Optimizing Tensor Computation Graphs with Equality Saturation and Monte Carlo Tree Search [0.0]
We present a tensor graph rewriting approach that uses Monte Carlo tree search to build superior representation.
Our approach improves the inference speedup of neural networks by up to 11% compared to existing methods.
arXiv Detail & Related papers (2024-10-07T22:22:02Z) - Reducing measurement costs by recycling the Hessian in adaptive variational quantum algorithms [0.0]
We propose an improved quasi-Newton optimization protocol specifically tailored to adaptive VQAs.
We implement a quasi-Newton algorithm where an approximation to the inverse Hessian matrix is continuously built and grown across the iterations of an adaptive VQA.
arXiv Detail & Related papers (2024-01-10T14:08:04Z) - ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - A Particle-based Sparse Gaussian Process Optimizer [5.672919245950197]
We present a new swarm-swarm-based framework utilizing the underlying dynamical process of descent.
The biggest advantage of this approach is greater exploration around the current state before deciding descent descent.
arXiv Detail & Related papers (2022-11-26T09:06:15Z) - An Accelerated Variance-Reduced Conditional Gradient Sliding Algorithm
for First-order and Zeroth-order Optimization [111.24899593052851]
Conditional gradient algorithm (also known as the Frank-Wolfe algorithm) has recently regained popularity in the machine learning community.
ARCS is the first zeroth-order conditional gradient sliding type algorithms solving convex problems in zeroth-order optimization.
In first-order optimization, the convergence results of ARCS substantially outperform previous algorithms in terms of the number of gradient query oracle.
arXiv Detail & Related papers (2021-09-18T07:08:11Z) - SHINE: SHaring the INverse Estimate from the forward pass for bi-level
optimization and implicit models [15.541264326378366]
In recent years, implicit deep learning has emerged as a method to increase the depth of deep neural networks.
The training is performed as a bi-level problem, and its computational complexity is partially driven by the iterative inversion of a huge Jacobian matrix.
We propose a novel strategy to tackle this computational bottleneck from which many bi-level problems suffer.
arXiv Detail & Related papers (2021-06-01T15:07:34Z) - Adaptive Importance Sampling for Finite-Sum Optimization and Sampling
with Decreasing Step-Sizes [4.355567556995855]
We propose Avare, a simple and efficient algorithm for adaptive importance sampling for finite-sum optimization and sampling with decreasing step-sizes.
Under standard technical conditions, we show that Avare achieves $mathcalO(T2/3)$ and $mathcalO(T5/6)$ dynamic regret for SGD and SGLD respectively when run with $mathcalO(T5/6)$ step sizes.
arXiv Detail & Related papers (2021-03-23T00:28:15Z) - Divide and Learn: A Divide and Conquer Approach for Predict+Optimize [50.03608569227359]
The predict+optimize problem combines machine learning ofproblem coefficients with a optimization prob-lem that uses the predicted coefficients.
We show how to directlyexpress the loss of the optimization problem in terms of thepredicted coefficients as a piece-wise linear function.
We propose a novel divide and algorithm to tackle optimization problems without this restriction and predict itscoefficients using the optimization loss.
arXiv Detail & Related papers (2020-12-04T00:26:56Z) - Self-Tuning Stochastic Optimization with Curvature-Aware Gradient
Filtering [53.523517926927894]
We explore the use of exact per-sample Hessian-vector products and gradients to construct self-tuning quadratics.
We prove that our model-based procedure converges in noisy gradient setting.
This is an interesting step for constructing self-tuning quadratics.
arXiv Detail & Related papers (2020-11-09T22:07:30Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - A Primer on Zeroth-Order Optimization in Signal Processing and Machine
Learning [95.85269649177336]
ZO optimization iteratively performs three major steps: gradient estimation, descent direction, and solution update.
We demonstrate promising applications of ZO optimization, such as evaluating and generating explanations from black-box deep learning models, and efficient online sensor management.
arXiv Detail & Related papers (2020-06-11T06:50:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.