Dynamic Experience Replay
- URL: http://arxiv.org/abs/2003.02372v1
- Date: Wed, 4 Mar 2020 23:46:45 GMT
- Title: Dynamic Experience Replay
- Authors: Jieliang Luo and Hui Li
- Abstract summary: We build upon Ape-X DDPG and demonstrate our approach on robotic tight-fitting joint assembly tasks.
In particular, we run experiments on two different tasks: peg-in-hole and lap-joint.
Our ablation studies show that Dynamic Experience Replay is a crucial ingredient that either largely shortens the training time in these challenging environments.
- Score: 6.062589413216726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel technique called Dynamic Experience Replay (DER) that
allows Reinforcement Learning (RL) algorithms to use experience replay samples
not only from human demonstrations but also successful transitions generated by
RL agents during training and therefore improve training efficiency. It can be
combined with an arbitrary off-policy RL algorithm, such as DDPG or DQN, and
their distributed versions. We build upon Ape-X DDPG and demonstrate our
approach on robotic tight-fitting joint assembly tasks, based on force/torque
and Cartesian pose observations. In particular, we run experiments on two
different tasks: peg-in-hole and lap-joint. In each case, we compare different
replay buffer structures and how DER affects them. Our ablation studies show
that Dynamic Experience Replay is a crucial ingredient that either largely
shortens the training time in these challenging environments or solves the
tasks that the vanilla Ape-X DDPG cannot solve. We also show that our policies
learned purely in simulation can be deployed successfully on the real robot.
The video presenting our experiments is available at
https://sites.google.com/site/dynamicexperiencereplay
Related papers
- On-Robot Reinforcement Learning with Goal-Contrastive Rewards [24.415607337006968]
Reinforcement Learning (RL) has the potential to enable robots to learn from their own actions in the real world.
We propose GCR (Goal-intensiveive Rewards), a dense reward function learning method that can be trained on passive video demonstrations.
GCR combines two loss functions, an implicit value loss function that models how the reward increases when traversing a successful trajectory, and a goal-contrastive loss that discriminates between successful and failed trajectories.
arXiv Detail & Related papers (2024-10-25T22:11:54Z) - Hindsight States: Blending Sim and Real Task Elements for Efficient
Reinforcement Learning [61.3506230781327]
In robotics, one approach to generate training data builds on simulations based on dynamics models derived from first principles.
Here, we leverage the imbalance in complexity of the dynamics to learn more sample-efficiently.
We validate our method on several challenging simulated tasks and demonstrate that it improves learning both alone and when combined with an existing hindsight algorithm.
arXiv Detail & Related papers (2023-03-03T21:55:04Z) - Reward Relabelling for combined Reinforcement and Imitation Learning on
sparse-reward tasks [2.0305676256390934]
We present a new method to leverage demonstrations and episodes collected online in any sparse-reward environment with any off-policy algorithm.
Our method is based on a reward bonus given to demonstrations and successful episodes, encouraging expert imitation and self-imitation.
Our experiments focus on manipulation robotics, specifically on three tasks for a 6 degrees-of-freedom robotic arm in simulation.
arXiv Detail & Related papers (2022-01-11T08:35:18Z) - Learning from demonstrations with SACR2: Soft Actor-Critic with Reward
Relabeling [2.1485350418225244]
Off-policy algorithms tend to be more sample-efficient, and can additionally benefit from any off-policy data stored in the replay buffer.
Expert demonstrations are a popular source for such data.
We present a new method, based on a reward bonus given to demonstrations and successful episodes.
arXiv Detail & Related papers (2021-10-27T14:30:29Z) - Learning to Run with Potential-Based Reward Shaping and Demonstrations
from Video Data [70.540936204654]
"Learning to run" competition was to train a two-legged model of a humanoid body to run in a simulated race course with maximum speed.
All submissions took a tabula rasa approach to reinforcement learning (RL) and were able to produce relatively fast, but not optimal running behaviour.
We demonstrate how data from videos of human running can be used to shape the reward of the humanoid learning agent.
arXiv Detail & Related papers (2020-12-16T09:46:58Z) - Reinforcement Learning with Videos: Combining Offline Observations with
Interaction [151.73346150068866]
Reinforcement learning is a powerful framework for robots to acquire skills from experience.
Videos of humans are a readily available source of broad and interesting experiences.
We propose a framework for reinforcement learning with videos.
arXiv Detail & Related papers (2020-11-12T17:15:48Z) - Decoupling Representation Learning from Reinforcement Learning [89.82834016009461]
We introduce an unsupervised learning task called Augmented Temporal Contrast (ATC)
ATC trains a convolutional encoder to associate pairs of observations separated by a short time difference.
In online RL experiments, we show that training the encoder exclusively using ATC matches or outperforms end-to-end RL.
arXiv Detail & Related papers (2020-09-14T19:11:13Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z) - Meta-Reinforcement Learning for Robotic Industrial Insertion Tasks [70.56451186797436]
We study how to use meta-reinforcement learning to solve the bulk of the problem in simulation.
We demonstrate our approach by training an agent to successfully perform challenging real-world insertion tasks.
arXiv Detail & Related papers (2020-04-29T18:00:22Z) - Towards Learning to Imitate from a Single Video Demonstration [11.15358253586118]
We develop a reinforcement learning agent that can learn to imitate given video observation.
We use a Siamese recurrent neural network architecture to learn rewards in space and time between motion clips.
We demonstrate our approach on simulated humanoid, dog, and raptor agents in 2D and a quadruped and a humanoid in 3D.
arXiv Detail & Related papers (2019-01-22T06:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.