Adaptive and Multiple Time-scale Eligibility Traces for Online Deep
Reinforcement Learning
- URL: http://arxiv.org/abs/2008.10040v2
- Date: Tue, 4 Jan 2022 00:51:09 GMT
- Title: Adaptive and Multiple Time-scale Eligibility Traces for Online Deep
Reinforcement Learning
- Authors: Taisuke Kobayashi
- Abstract summary: The eligibility traces method is well known as an online learning technique for improving sample efficiency.
The dependency between parameters of deep neural networks would destroy the eligibility traces, which is why they are not integrated with DRL.
This study proposes a new eligibility traces method that can be used even in DRL while maintaining high sample efficiency.
- Score: 8.071506311915396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning (DRL) is one promising approach to teaching
robots to perform complex tasks. Because methods that directly reuse the stored
experience data cannot follow the change of the environment in robotic problems
with a time-varying environment, online DRL is required. The eligibility traces
method is well known as an online learning technique for improving sample
efficiency in traditional reinforcement learning with linear regressors rather
than DRL. The dependency between parameters of deep neural networks would
destroy the eligibility traces, which is why they are not integrated with DRL.
Although replacing the gradient with the most influential one rather than
accumulating the gradients as the eligibility traces can alleviate this
problem, the replacing operation reduces the number of reuses of previous
experiences. To address these issues, this study proposes a new eligibility
traces method that can be used even in DRL while maintaining high sample
efficiency. When the accumulated gradients differ from those computed using the
latest parameters, the proposed method takes into account the divergence
between the past and latest parameters to adaptively decay the eligibility
traces. Bregman divergences between outputs computed by the past and latest
parameters are exploited due to the infeasible computational cost of the
divergence between the past and latest parameters. In addition, a generalized
method with multiple time-scale traces is designed for the first time. This
design allows for the replacement of the most influential adaptively
accumulated (decayed) eligibility traces.
Related papers
- MelissaDL x Breed: Towards Data-Efficient On-line Supervised Training of Multi-parametric Surrogates with Active Learning [0.0]
We introduce a new active learning method to enhance data-efficiency for on-line surrogate training.
The surrogate is trained to predict a given timestep directly with different initial and boundary conditions parameters.
Preliminary results for 2D heat PDE demonstrate the potential of this method, called Breed, to improve the generalization capabilities of surrogates.
arXiv Detail & Related papers (2024-10-08T09:52:15Z) - Tractable Offline Learning of Regular Decision Processes [50.11277112628193]
This work studies offline Reinforcement Learning (RL) in a class of non-Markovian environments called Regular Decision Processes (RDPs)
Ins, the unknown dependency of future observations and rewards from the past interactions can be captured experimentally.
Many algorithms first reconstruct this unknown dependency using automata learning techniques.
arXiv Detail & Related papers (2024-09-04T14:26:58Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - Learning Diverse Policies with Soft Self-Generated Guidance [2.9602904918952695]
Reinforcement learning with sparse and deceptive rewards is challenging because non-zero rewards are rarely obtained.
This paper develops an approach that uses diverse past trajectories for faster and more efficient online RL.
arXiv Detail & Related papers (2024-02-07T02:53:50Z) - Solving Continual Offline Reinforcement Learning with Decision Transformer [78.59473797783673]
Continuous offline reinforcement learning (CORL) combines continuous and offline reinforcement learning.
Existing methods, employing Actor-Critic structures and experience replay (ER), suffer from distribution shifts, low efficiency, and weak knowledge-sharing.
We introduce multi-head DT (MH-DT) and low-rank adaptation DT (LoRA-DT) to mitigate DT's forgetting problem.
arXiv Detail & Related papers (2024-01-16T16:28:32Z) - One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive
Least-Squares [8.443742714362521]
We develop an algorithm for one-pass learning which seeks to perfectly fit every new datapoint while changing the parameters in a direction that causes the least change to the predictions on previous datapoints.
Our algorithm uses the memory efficiently by exploiting the structure of the streaming data via an incremental principal component analysis (IPCA)
Our experiments show the effectiveness of the proposed method compared to the baselines.
arXiv Detail & Related papers (2022-07-28T02:01:31Z) - Streaming Linear System Identification with Reverse Experience Replay [45.17023170054112]
We consider the problem of estimating a linear time-invariant (LTI) dynamical system from a single trajectory via streaming algorithms.
In many problems of interest as encountered in reinforcement learning (RL), it is important to estimate the parameters on the go using gradient oracle.
We propose a novel, SGD with Reverse Experience Replay (SGD-RER), that is inspired by the experience replay (ER) technique popular in the RL literature.
arXiv Detail & Related papers (2021-03-10T06:51:55Z) - Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem)
AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient.
Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.