Offline Learning from Demonstrations and Unlabeled Experience
- URL: http://arxiv.org/abs/2011.13885v1
- Date: Fri, 27 Nov 2020 18:20:04 GMT
- Title: Offline Learning from Demonstrations and Unlabeled Experience
- Authors: Konrad Zolna, Alexander Novikov, Ksenia Konyushkova, Caglar Gulcehre,
Ziyu Wang, Yusuf Aytar, Misha Denil, Nando de Freitas, Scott Reed
- Abstract summary: Behavior Imitation (BC) is often practical for robot learning because it allows a policy to be trained offline without rewards, by supervised learning on expert demonstrations.
This unlabeled data can be generated by a variety of sources such as human teleoperation, scripted policies and other agents on the same robot.
We show that Offline Reinforced Learning (ORIL) consistently outperforms comparable BC agents by effectively leveraging unlabeled experience.
- Score: 62.928404936397335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Behavior cloning (BC) is often practical for robot learning because it allows
a policy to be trained offline without rewards, by supervised learning on
expert demonstrations. However, BC does not effectively leverage what we will
refer to as unlabeled experience: data of mixed and unknown quality without
reward annotations. This unlabeled data can be generated by a variety of
sources such as human teleoperation, scripted policies and other agents on the
same robot. Towards data-driven offline robot learning that can use this
unlabeled experience, we introduce Offline Reinforced Imitation Learning
(ORIL). ORIL first learns a reward function by contrasting observations from
demonstrator and unlabeled trajectories, then annotates all data with the
learned reward, and finally trains an agent via offline reinforcement learning.
Across a diverse set of continuous control and simulated robotic manipulation
tasks, we show that ORIL consistently outperforms comparable BC agents by
effectively leveraging unlabeled experience.
Related papers
- Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - Improving Behavioural Cloning with Positive Unlabeled Learning [15.484227081812852]
We propose a novel iterative learning algorithm for identifying expert trajectories in mixed-quality robotics datasets.
Applying behavioral cloning to the resulting filtered dataset outperforms several competitive offline reinforcement learning and imitation learning baselines.
arXiv Detail & Related papers (2023-01-27T14:17:45Z) - Implicit Offline Reinforcement Learning via Supervised Learning [83.8241505499762]
Offline Reinforcement Learning (RL) via Supervised Learning is a simple and effective way to learn robotic skills from a dataset collected by policies of different expertise levels.
We show how implicit models can leverage return information and match or outperform explicit algorithms to acquire robotic skills from fixed datasets.
arXiv Detail & Related papers (2022-10-21T21:59:42Z) - Opinion Spam Detection: A New Approach Using Machine Learning and
Network-Based Algorithms [2.062593640149623]
Online reviews play a crucial role in helping consumers evaluate and compare products and services.
Fake reviews (opinion spam) are becoming more prevalent and negatively impacting customers and service providers.
We propose a new method for classifying reviewers as spammers or benign, combining machine learning with a message-passing algorithm.
arXiv Detail & Related papers (2022-05-26T15:27:46Z) - Continual Learning from Demonstration of Robotics Skills [5.573543601558405]
Methods for teaching motion skills to robots focus on training for a single skill at a time.
We propose an approach for continual learning from demonstration using hypernetworks and neural ordinary differential equation solvers.
arXiv Detail & Related papers (2022-02-14T16:26:52Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for
Semi-supervised Continual Learning [52.831894583501395]
Continual learning assumes the incoming data are fully labeled, which might not be applicable in real applications.
We propose deep Online Replay with Discriminator Consistency (ORDisCo) to interdependently learn a classifier with a conditional generative adversarial network (GAN)
We show ORDisCo achieves significant performance improvement on various semi-supervised learning benchmark datasets for SSCL.
arXiv Detail & Related papers (2021-01-02T09:04:14Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.