Reward Relabelling for combined Reinforcement and Imitation Learning on
sparse-reward tasks
- URL: http://arxiv.org/abs/2201.03834v1
- Date: Tue, 11 Jan 2022 08:35:18 GMT
- Title: Reward Relabelling for combined Reinforcement and Imitation Learning on
sparse-reward tasks
- Authors: Jesus Bujalance Martin, Fabien Moutarde
- Abstract summary: We present a new method to leverage demonstrations and episodes collected online in any sparse-reward environment with any off-policy algorithm.
Our method is based on a reward bonus given to demonstrations and successful episodes, encouraging expert imitation and self-imitation.
Our experiments focus on manipulation robotics, specifically on three tasks for a 6 degrees-of-freedom robotic arm in simulation.
- Score: 2.0305676256390934
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: During recent years, deep reinforcement learning (DRL) has made successful
incursions into complex decision-making applications such as robotics,
autonomous driving or video games. In the search for more sample-efficient
algorithms, a promising direction is to leverage as much external off-policy
data as possible. One staple of this data-driven approach is to learn from
expert demonstrations. In the past, multiple ideas have been proposed to make
good use of the demonstrations added to the replay buffer, such as pretraining
on demonstrations only or minimizing additional cost functions. We present a
new method, able to leverage demonstrations and episodes collected online in
any sparse-reward environment with any off-policy algorithm. Our method is
based on a reward bonus given to demonstrations and successful episodes,
encouraging expert imitation and self-imitation. First, we give a reward bonus
to the transitions coming from demonstrations to encourage the agent to match
the demonstrated behaviour. Then, upon collecting a successful episode, we
relabel its transitions with the same bonus before adding them to the replay
buffer, encouraging the agent to also match its previous successes. Our
experiments focus on manipulation robotics, specifically on three tasks for a 6
degrees-of-freedom robotic arm in simulation. We show that our method based on
reward relabeling improves the performance of the base algorithm (SAC and DDPG)
on these tasks, even in the absence of demonstrations. Furthermore, integrating
into our method two improvements from previous works allows our approach to
outperform all baselines.
Related papers
- Latent Action Priors From a Single Gait Cycle Demonstration for Online Imitation Learning [42.642008092347986]
We propose an additional inductive bias for robot learning: latent actions learned from expert demonstration as priors in the action space.
We show that these action priors can be learned from only a single open-loop gait cycle using a simple autoencoder.
arXiv Detail & Related papers (2024-10-04T09:10:56Z) - Refining Pre-Trained Motion Models [56.18044168821188]
We take on the challenge of improving state-of-the-art supervised models with self-supervised training.
We focus on obtaining a "clean" training signal from real-world unlabelled video.
We show that our method yields reliable gains over fully-supervised methods in real videos.
arXiv Detail & Related papers (2024-01-01T18:59:33Z) - Reinforcement learning with Demonstrations from Mismatched Task under
Sparse Reward [7.51772160511614]
Reinforcement learning often suffer from the sparse reward issue in real-world robotics problems.
Prior works often assume that the learning agent and the expert aim to accomplish the same task, which requires collecting new data for every new task.
In this paper, we consider the case where the target task is mismatched from but similar with that of the expert.
Existing LfD methods can not effectively guide learning in mismatched new tasks with sparse rewards.
arXiv Detail & Related papers (2022-12-03T02:24:59Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Learning from demonstrations with SACR2: Soft Actor-Critic with Reward
Relabeling [2.1485350418225244]
Off-policy algorithms tend to be more sample-efficient, and can additionally benefit from any off-policy data stored in the replay buffer.
Expert demonstrations are a popular source for such data.
We present a new method, based on a reward bonus given to demonstrations and successful episodes.
arXiv Detail & Related papers (2021-10-27T14:30:29Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Semi-supervised reward learning for offline reinforcement learning [71.6909757718301]
Training agents usually requires reward functions, but rewards are seldom available in practice and their engineering is challenging and laborious.
We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data.
In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards.
arXiv Detail & Related papers (2020-12-12T20:06:15Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z) - Dynamic Experience Replay [6.062589413216726]
We build upon Ape-X DDPG and demonstrate our approach on robotic tight-fitting joint assembly tasks.
In particular, we run experiments on two different tasks: peg-in-hole and lap-joint.
Our ablation studies show that Dynamic Experience Replay is a crucial ingredient that either largely shortens the training time in these challenging environments.
arXiv Detail & Related papers (2020-03-04T23:46:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.