Related papers: Dynamic Experience Replay

Dynamic Experience Replay

URL: http://arxiv.org/abs/2003.02372v1
Date: Wed, 4 Mar 2020 23:46:45 GMT
Title: Dynamic Experience Replay
Authors: Jieliang Luo and Hui Li
Abstract summary: We build upon Ape-X DDPG and demonstrate our approach on robotic tight-fitting joint assembly tasks. In particular, we run experiments on two different tasks: peg-in-hole and lap-joint. Our ablation studies show that Dynamic Experience Replay is a crucial ingredient that either largely shortens the training time in these challenging environments.
Score: 6.062589413216726
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a novel technique called Dynamic Experience Replay (DER) that allows Reinforcement Learning (RL) algorithms to use experience replay samples not only from human demonstrations but also successful transitions generated by RL agents during training and therefore improve training efficiency. It can be combined with an arbitrary off-policy RL algorithm, such as DDPG or DQN, and their distributed versions. We build upon Ape-X DDPG and demonstrate our approach on robotic tight-fitting joint assembly tasks, based on force/torque and Cartesian pose observations. In particular, we run experiments on two different tasks: peg-in-hole and lap-joint. In each case, we compare different replay buffer structures and how DER affects them. Our ablation studies show that Dynamic Experience Replay is a crucial ingredient that either largely shortens the training time in these challenging environments or solves the tasks that the vanilla Ape-X DDPG cannot solve. We also show that our policies learned purely in simulation can be deployed successfully on the real robot. The video presenting our experiments is available at https://sites.google.com/site/dynamicexperiencereplay

Related papers

Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions. By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z)
On-Robot Reinforcement Learning with Goal-Contrastive Rewards [24.415607337006968]
Reinforcement Learning (RL) has the potential to enable robots to learn from their own actions in the real world. We propose GCR (Goal-intensiveive Rewards), a dense reward function learning method that can be trained on passive video demonstrations. GCR combines two loss functions, an implicit value loss function that models how the reward increases when traversing a successful trajectory, and a goal-contrastive loss that discriminates between successful and failed trajectories.
arXiv Detail & Related papers (2024-10-25T22:11:54Z)
Hindsight States: Blending Sim and Real Task Elements for Efficient Reinforcement Learning [61.3506230781327]
In robotics, one approach to generate training data builds on simulations based on dynamics models derived from first principles. Here, we leverage the imbalance in complexity of the dynamics to learn more sample-efficiently. We validate our method on several challenging simulated tasks and demonstrate that it improves learning both alone and when combined with an existing hindsight algorithm.
arXiv Detail & Related papers (2023-03-03T21:55:04Z)
Reward Relabelling for combined Reinforcement and Imitation Learning on sparse-reward tasks [2.0305676256390934]
We present a new method to leverage demonstrations and episodes collected online in any sparse-reward environment with any off-policy algorithm. Our method is based on a reward bonus given to demonstrations and successful episodes, encouraging expert imitation and self-imitation. Our experiments focus on manipulation robotics, specifically on three tasks for a 6 degrees-of-freedom robotic arm in simulation.
arXiv Detail & Related papers (2022-01-11T08:35:18Z)
Learning from demonstrations with SACR2: Soft Actor-Critic with Reward Relabeling [2.1485350418225244]
Off-policy algorithms tend to be more sample-efficient, and can additionally benefit from any off-policy data stored in the replay buffer. Expert demonstrations are a popular source for such data. We present a new method, based on a reward bonus given to demonstrations and successful episodes.
arXiv Detail & Related papers (2021-10-27T14:30:29Z)
Learning to Run with Potential-Based Reward Shaping and Demonstrations from Video Data [70.540936204654]
"Learning to run" competition was to train a two-legged model of a humanoid body to run in a simulated race course with maximum speed. All submissions took a tabula rasa approach to reinforcement learning (RL) and were able to produce relatively fast, but not optimal running behaviour. We demonstrate how data from videos of human running can be used to shape the reward of the humanoid learning agent.
arXiv Detail & Related papers (2020-12-16T09:46:58Z)
Reinforcement Learning with Videos: Combining Offline Observations with Interaction [151.73346150068866]
Reinforcement learning is a powerful framework for robots to acquire skills from experience. Videos of humans are a readily available source of broad and interesting experiences. We propose a framework for reinforcement learning with videos.
arXiv Detail & Related papers (2020-11-12T17:15:48Z)
Decoupling Representation Learning from Reinforcement Learning [89.82834016009461]
We introduce an unsupervised learning task called Augmented Temporal Contrast (ATC) ATC trains a convolutional encoder to associate pairs of observations separated by a short time difference. In online RL experiments, we show that training the encoder exclusively using ATC matches or outperforms end-to-end RL.
arXiv Detail & Related papers (2020-09-14T19:11:13Z)
Forgetful Experience Replay in Hierarchical Reinforcement Learning from Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments. Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations. The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)
Meta-Reinforcement Learning for Robotic Industrial Insertion Tasks [70.56451186797436]
We study how to use meta-reinforcement learning to solve the bulk of the problem in simulation. We demonstrate our approach by training an agent to successfully perform challenging real-world insertion tasks.
arXiv Detail & Related papers (2020-04-29T18:00:22Z)
Towards Learning to Imitate from a Single Video Demonstration [11.15358253586118]
We develop a reinforcement learning agent that can learn to imitate given video observation. We use a Siamese recurrent neural network architecture to learn rewards in space and time between motion clips. We demonstrate our approach on simulated humanoid, dog, and raptor agents in 2D and a quadruped and a humanoid in 3D.
arXiv Detail & Related papers (2019-01-22T06:46:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.