Hindsight Goal Ranking on Replay Buffer for Sparse Reward Environment
- URL: http://arxiv.org/abs/2110.15043v1
- Date: Thu, 28 Oct 2021 12:09:10 GMT
- Title: Hindsight Goal Ranking on Replay Buffer for Sparse Reward Environment
- Authors: Tung M. Luu, Chang D. Yoo
- Abstract summary: The paper proposes a method for prioritizing the replay experience referred to as Hindsight Goal Ranking (HGR)
HGR samples with higher probability on the states visited in an episode with larger temporal difference (TD) error.
The proposed method combined with Deep Deterministic Policy Gradient (DDPG), an off-policy model-free actor-critic algorithm, accelerates learning significantly faster than that.
- Score: 16.422215672356167
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes a method for prioritizing the replay experience referred
to as Hindsight Goal Ranking (HGR) in overcoming the limitation of Hindsight
Experience Replay (HER) that generates hindsight goals based on uniform
sampling. HGR samples with higher probability on the states visited in an
episode with larger temporal difference (TD) error, which is considered as a
proxy measure of the amount which the RL agent can learn from an experience.
The actual sampling for large TD error is performed in two steps: first, an
episode is sampled from the relay buffer according to the average TD error of
its experiences, and then, for the sampled episode, the hindsight goal leading
to larger TD error is sampled with higher probability from future visited
states. The proposed method combined with Deep Deterministic Policy Gradient
(DDPG), an off-policy model-free actor-critic algorithm, accelerates learning
significantly faster than that without any prioritization on four challenging
simulated robotic manipulation tasks. The empirical results show that HGR uses
samples more efficiently than previous methods across all tasks.
Related papers
- MRHER: Model-based Relay Hindsight Experience Replay for Sequential Object Manipulation Tasks with Sparse Rewards [11.79027801942033]
We propose a novel model-based RL framework called Model-based Relay Hindsight Experience Replay (MRHER)
MRHER breaks down a continuous task into subtasks with increasing complexity and utilizes the previous subtask to guide the learning of the subsequent one.
We show that MRHER exhibits state-of-the-art sample efficiency in benchmark tasks, outperforming RHER by 13.79% and 14.29%.
arXiv Detail & Related papers (2023-06-28T09:51:25Z) - Sample Dropout: A Simple yet Effective Variance Reduction Technique in
Deep Policy Optimization [18.627233013208834]
We show that the use of importance sampling could introduce high variance in the objective estimate.
We propose a technique called sample dropout to bound the estimation variance by dropping out samples when their ratio deviation is too high.
arXiv Detail & Related papers (2023-02-05T04:44:35Z) - ReDi: Efficient Learning-Free Diffusion Inference via Trajectory
Retrieval [68.7008281316644]
ReDi is a learning-free Retrieval-based Diffusion sampling framework.
We show that ReDi improves the model inference efficiency by 2x speedup.
arXiv Detail & Related papers (2023-02-05T03:01:28Z) - Post-Processing Temporal Action Detection [134.26292288193298]
Temporal Action Detection (TAD) methods typically take a pre-processing step in converting an input varying-length video into a fixed-length snippet representation sequence.
This pre-processing step would temporally downsample the video, reducing the inference resolution and hampering the detection performance in the original temporal resolution.
We introduce a novel model-agnostic post-processing method without model redesign and retraining.
arXiv Detail & Related papers (2022-11-27T19:50:37Z) - Adaptive Sketches for Robust Regression with Importance Sampling [64.75899469557272]
We introduce data structures for solving robust regression through gradient descent (SGD)
Our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data.
arXiv Detail & Related papers (2022-07-16T03:09:30Z) - USHER: Unbiased Sampling for Hindsight Experience Replay [12.660090786323067]
Dealing with sparse rewards is a long-standing challenge in reinforcement learning (RL)
Hindsight Experience Replay (HER) addresses this problem by reusing failed trajectories for one goal as successful trajectories for another.
This strategy is known to result in a biased value function, as the update rule underestimates the likelihood of bad outcomes in a environment.
We propose anally unbiased importance-based algorithm to address this problem without sacrificing performance on deterministic environments.
arXiv Detail & Related papers (2022-07-03T20:25:06Z) - Stratified Experience Replay: Correcting Multiplicity Bias in Off-Policy
Reinforcement Learning [17.3794999533024]
We show that deep RL appears to struggle in the presence of extraneous data.
Recent works have shown that the performance of Deep Q-Network (DQN) degrades when its replay memory becomes too large.
We re-examine the motivation for sampling uniformly over a replay memory, and find that it may be flawed when using function approximation.
arXiv Detail & Related papers (2021-02-22T19:29:18Z) - Understanding and Mitigating the Limitations of Prioritized Experience
Replay [46.663239542920984]
Prioritized Replay Experience (ER) has been empirically shown to improve sample efficiency across many domains.
We show equivalence between the error-based prioritized sampling method for mean squared error and uniform sampling for cubic power loss.
We then provide theoretical insight into why it improves convergence rate upon uniform sampling during early learning.
arXiv Detail & Related papers (2020-07-19T03:10:02Z) - Experience Replay with Likelihood-free Importance Weights [123.52005591531194]
We propose to reweight experiences based on their likelihood under the stationary distribution of the current policy.
We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3)
arXiv Detail & Related papers (2020-06-23T17:17:44Z) - SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier
Detection [63.253850875265115]
Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples.
We propose a modular acceleration system, called SUOD, to address it.
arXiv Detail & Related papers (2020-03-11T00:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.