Related papers: Adaptive and Multiple Time-scale Eligibility Traces for Online Deep Reinforcement Learning

Adaptive and Multiple Time-scale Eligibility Traces for Online Deep Reinforcement Learning

URL: http://arxiv.org/abs/2008.10040v2
Date: Tue, 4 Jan 2022 00:51:09 GMT
Title: Adaptive and Multiple Time-scale Eligibility Traces for Online Deep Reinforcement Learning
Authors: Taisuke Kobayashi
Abstract summary: The eligibility traces method is well known as an online learning technique for improving sample efficiency. The dependency between parameters of deep neural networks would destroy the eligibility traces, which is why they are not integrated with DRL. This study proposes a new eligibility traces method that can be used even in DRL while maintaining high sample efficiency.
Score: 8.071506311915396
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep reinforcement learning (DRL) is one promising approach to teaching robots to perform complex tasks. Because methods that directly reuse the stored experience data cannot follow the change of the environment in robotic problems with a time-varying environment, online DRL is required. The eligibility traces method is well known as an online learning technique for improving sample efficiency in traditional reinforcement learning with linear regressors rather than DRL. The dependency between parameters of deep neural networks would destroy the eligibility traces, which is why they are not integrated with DRL. Although replacing the gradient with the most influential one rather than accumulating the gradients as the eligibility traces can alleviate this problem, the replacing operation reduces the number of reuses of previous experiences. To address these issues, this study proposes a new eligibility traces method that can be used even in DRL while maintaining high sample efficiency. When the accumulated gradients differ from those computed using the latest parameters, the proposed method takes into account the divergence between the past and latest parameters to adaptively decay the eligibility traces. Bregman divergences between outputs computed by the past and latest parameters are exploited due to the infeasible computational cost of the divergence between the past and latest parameters. In addition, a generalized method with multiple time-scale traces is designed for the first time. This design allows for the replacement of the most influential adaptively accumulated (decayed) eligibility traces.

Related papers

EKPC: Elastic Knowledge Preservation and Compensation for Class-Incremental Learning [53.88000987041739]
Class-Incremental Learning (CIL) aims to enable AI models to continuously learn from sequentially arriving data of different classes over time.<n>We propose the Elastic Knowledge Preservation and Compensation (EKPC) method, integrating Importance-aware importance Regularization (IPR) and Trainable Semantic Drift Compensation (TSDC) for CIL.
arXiv Detail & Related papers (2025-06-14T05:19:58Z)
MelissaDL x Breed: Towards Data-Efficient On-line Supervised Training of Multi-parametric Surrogates with Active Learning [0.0]
We introduce a new active learning method to enhance data-efficiency for on-line surrogate training. The surrogate is trained to predict a given timestep directly with different initial and boundary conditions parameters. Preliminary results for 2D heat PDE demonstrate the potential of this method, called Breed, to improve the generalization capabilities of surrogates.
arXiv Detail & Related papers (2024-10-08T09:52:15Z)
Tractable Offline Learning of Regular Decision Processes [50.11277112628193]
This work studies offline Reinforcement Learning (RL) in a class of non-Markovian environments called Regular Decision Processes (RDPs) Ins, the unknown dependency of future observations and rewards from the past interactions can be captured experimentally. Many algorithms first reconstruct this unknown dependency using automata learning techniques.
arXiv Detail & Related papers (2024-09-04T14:26:58Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
Learning Diverse Policies with Soft Self-Generated Guidance [2.9602904918952695]
Reinforcement learning with sparse and deceptive rewards is challenging because non-zero rewards are rarely obtained. This paper develops an approach that uses diverse past trajectories for faster and more efficient online RL.
arXiv Detail & Related papers (2024-02-07T02:53:50Z)
Solving Continual Offline Reinforcement Learning with Decision Transformer [78.59473797783673]
Continuous offline reinforcement learning (CORL) combines continuous and offline reinforcement learning. Existing methods, employing Actor-Critic structures and experience replay (ER), suffer from distribution shifts, low efficiency, and weak knowledge-sharing. We introduce multi-head DT (MH-DT) and low-rank adaptation DT (LoRA-DT) to mitigate DT's forgetting problem.
arXiv Detail & Related papers (2024-01-16T16:28:32Z)
One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares [8.443742714362521]
We develop an algorithm for one-pass learning which seeks to perfectly fit every new datapoint while changing the parameters in a direction that causes the least change to the predictions on previous datapoints. Our algorithm uses the memory efficiently by exploiting the structure of the streaming data via an incremental principal component analysis (IPCA) Our experiments show the effectiveness of the proposed method compared to the baselines.
arXiv Detail & Related papers (2022-07-28T02:01:31Z)
Streaming Linear System Identification with Reverse Experience Replay [45.17023170054112]
We consider the problem of estimating a linear time-invariant (LTI) dynamical system from a single trajectory via streaming algorithms. In many problems of interest as encountered in reinforcement learning (RL), it is important to estimate the parameters on the go using gradient oracle. We propose a novel, SGD with Reverse Experience Replay (SGD-RER), that is inspired by the experience replay (ER) technique popular in the RL literature.
arXiv Detail & Related papers (2021-03-10T06:51:55Z)
Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem) AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient. Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z)
AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS) Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.