USHER: Unbiased Sampling for Hindsight Experience Replay
- URL: http://arxiv.org/abs/2207.01115v1
- Date: Sun, 3 Jul 2022 20:25:06 GMT
- Title: USHER: Unbiased Sampling for Hindsight Experience Replay
- Authors: Liam Schramm, Yunfu Deng, Edgar Granados, Abdeslam Boularias
- Abstract summary: Dealing with sparse rewards is a long-standing challenge in reinforcement learning (RL)
Hindsight Experience Replay (HER) addresses this problem by reusing failed trajectories for one goal as successful trajectories for another.
This strategy is known to result in a biased value function, as the update rule underestimates the likelihood of bad outcomes in a environment.
We propose anally unbiased importance-based algorithm to address this problem without sacrificing performance on deterministic environments.
- Score: 12.660090786323067
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dealing with sparse rewards is a long-standing challenge in reinforcement
learning (RL). Hindsight Experience Replay (HER) addresses this problem by
reusing failed trajectories for one goal as successful trajectories for
another. This allows for both a minimum density of reward and for
generalization across multiple goals. However, this strategy is known to result
in a biased value function, as the update rule underestimates the likelihood of
bad outcomes in a stochastic environment. We propose an asymptotically unbiased
importance-sampling-based algorithm to address this problem without sacrificing
performance on deterministic environments. We show its effectiveness on a range
of robotic systems, including challenging high dimensional stochastic
environments.
Related papers
- No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery [53.08822154199948]
Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks.
This work investigates how existing UED methods select training environments, focusing on task prioritisation metrics.
We develop a method that directly trains on scenarios with high learnability.
arXiv Detail & Related papers (2024-08-27T14:31:54Z) - The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret [64.04721528586747]
In reinforcement learning, specifying reward functions that capture the intended task can be very challenging.
In this paper, we mathematically show that a sufficiently low expected test error of the reward model guarantees low worst-case regret.
We then show that similar problems persist even when using policy regularization techniques, commonly employed in methods such as RLHF.
arXiv Detail & Related papers (2024-06-22T06:43:51Z) - REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - MRHER: Model-based Relay Hindsight Experience Replay for Sequential Object Manipulation Tasks with Sparse Rewards [11.79027801942033]
We propose a novel model-based RL framework called Model-based Relay Hindsight Experience Replay (MRHER)
MRHER breaks down a continuous task into subtasks with increasing complexity and utilizes the previous subtask to guide the learning of the subsequent one.
We show that MRHER exhibits state-of-the-art sample efficiency in benchmark tasks, outperforming RHER by 13.79% and 14.29%.
arXiv Detail & Related papers (2023-06-28T09:51:25Z) - You Can't Count on Luck: Why Decision Transformers Fail in Stochastic
Environments [31.117949189062895]
Decision Transformer that reduce reinforcement learning to a prediction task and solve it via supervised learning (RvS) have become popular due to their simplicity, robustness to hypers, and strong overall performance on offline tasks.
However, simply conditioning a model on a desired return and taking the predicted action can fail dramatically in environments that result in a return due to luck.
In this work, we describe the limitations of RvS approaches in environments and propose a solution.
Rather than simply conditioning on the return of a single trajectory as is standard practice, our proposed method, ESPER, learns to cluster trajectories and conditions
arXiv Detail & Related papers (2022-05-31T17:15:44Z) - Unbiased Methods for Multi-Goal Reinforcement Learning [13.807859854345834]
In multi-goal reinforcement learning, the reward for each goal is sparse, and located in a small neighborhood of the goal.
We show that Hindsight Experience Replay (HER) can converge to low-return policies by overestimating chancy outcomes.
We introduce unbiased deep Q-learning and actor-critic algorithms that can handle such infinitely sparse rewards, and test them in toy environments.
arXiv Detail & Related papers (2021-06-16T15:31:51Z) - Regressive Domain Adaptation for Unsupervised Keypoint Detection [67.2950306888855]
Domain adaptation (DA) aims at transferring knowledge from a labeled source domain to an unlabeled target domain.
We present a method of regressive domain adaptation (RegDA) for unsupervised keypoint detection.
Our method brings large improvement by 8% to 11% in terms of PCK on different datasets.
arXiv Detail & Related papers (2021-03-10T16:45:22Z) - Learning a Unified Sample Weighting Network for Object Detection [113.98404690619982]
Region sampling or weighting is significantly important to the success of modern region-based object detectors.
We argue that sample weighting should be data-dependent and task-dependent.
We propose a unified sample weighting network to predict a sample's task weights.
arXiv Detail & Related papers (2020-06-11T16:19:16Z) - Soft Hindsight Experience Replay [77.99182201815763]
Soft Hindsight Experience Replay (SHER) is a novel approach based on HER and Maximum Entropy Reinforcement Learning (MERL)
We evaluate SHER on Open AI Robotic manipulation tasks with sparse rewards.
arXiv Detail & Related papers (2020-02-06T03:57:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.