Soft Hindsight Experience Replay
- URL: http://arxiv.org/abs/2002.02089v1
- Date: Thu, 6 Feb 2020 03:57:04 GMT
- Title: Soft Hindsight Experience Replay
- Authors: Qiwei He, Liansheng Zhuang, Houqiang Li
- Abstract summary: Soft Hindsight Experience Replay (SHER) is a novel approach based on HER and Maximum Entropy Reinforcement Learning (MERL)
We evaluate SHER on Open AI Robotic manipulation tasks with sparse rewards.
- Score: 77.99182201815763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Efficient learning in the environment with sparse rewards is one of the most
important challenges in Deep Reinforcement Learning (DRL). In continuous DRL
environments such as robotic arms control, Hindsight Experience Replay (HER)
has been shown an effective solution. However, due to the brittleness of
deterministic methods, HER and its variants typically suffer from a major
challenge for stability and convergence, which significantly affects the final
performance. This challenge severely limits the applicability of such methods
to complex real-world domains. To tackle this challenge, in this paper, we
propose Soft Hindsight Experience Replay (SHER), a novel approach based on HER
and Maximum Entropy Reinforcement Learning (MERL), combining the failed
experiences reuse and maximum entropy probabilistic inference model. We
evaluate SHER on Open AI Robotic manipulation tasks with sparse rewards.
Experimental results show that, in contrast to HER and its variants, our
proposed SHER achieves state-of-the-art performance, especially in the
difficult HandManipulation tasks. Furthermore, our SHER method is more stable,
achieving very similar performance across different random seeds.
Related papers
- Efficient Diversity-based Experience Replay for Deep Reinforcement Learning [14.96744975805832]
This paper proposes a novel approach, diversity-based experience replay (DBER), which leverages the deterministic point process to prioritize diverse samples in state realizations.
We conducted extensive experiments on Robotic Manipulation tasks in MuJoCo, Atari games, and realistic in-door environments in Habitat.
arXiv Detail & Related papers (2024-10-27T15:51:27Z) - Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE)
RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies.
We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z) - Never Explore Repeatedly in Multi-Agent Reinforcement Learning [40.35950679063337]
We propose a dynamic reward scaling approach to combat "revisitation"
We show enhanced performance in demanding environments like Google Research Football and StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2023-08-19T05:27:48Z) - Planning for Sample Efficient Imitation Learning [52.44953015011569]
Current imitation algorithms struggle to achieve high performance and high in-environment sample efficiency simultaneously.
We propose EfficientImitate, a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously.
Experimental results show that EI achieves state-of-the-art results in performance and sample efficiency.
arXiv Detail & Related papers (2022-10-18T05:19:26Z) - USHER: Unbiased Sampling for Hindsight Experience Replay [12.660090786323067]
Dealing with sparse rewards is a long-standing challenge in reinforcement learning (RL)
Hindsight Experience Replay (HER) addresses this problem by reusing failed trajectories for one goal as successful trajectories for another.
This strategy is known to result in a biased value function, as the update rule underestimates the likelihood of bad outcomes in a environment.
We propose anally unbiased importance-based algorithm to address this problem without sacrificing performance on deterministic environments.
arXiv Detail & Related papers (2022-07-03T20:25:06Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z) - MHER: Model-based Hindsight Experience Replay [33.00149668905828]
We propose Model-based Hindsight Experience Replay (MHER) to solve multi-goal reinforcement learning problems.
replacing original goals with virtual goals generated from interaction with a trained dynamics model leads to a novel relabeling method.
MHER exploits experiences more efficiently by leveraging environmental dynamics to generate virtual achieved goals.
arXiv Detail & Related papers (2021-07-01T08:52:45Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z) - Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations [78.94386823185724]
Imitation learning learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations.
In practice, collecting a sufficient amount of expert demonstrations can be prohibitively expensive.
We propose Self-Adaptive Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations.
arXiv Detail & Related papers (2020-04-01T15:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.