DCUR: Data Curriculum for Teaching via Samples with Reinforcement
Learning
- URL: http://arxiv.org/abs/2109.07380v1
- Date: Wed, 15 Sep 2021 15:39:46 GMT
- Title: DCUR: Data Curriculum for Teaching via Samples with Reinforcement
Learning
- Authors: Daniel Seita, Abhinav Gopal, Zhao Mandi, John Canny
- Abstract summary: We propose a framework, Data CUrriculum for Reinforcement learning (DCUR), which first trains teachers using online deep RL.
Then, students learn by running either offline RL or by using teacher data in combination with a small amount of self-generated data.
- Score: 6.9884912034790405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning (RL) has shown great empirical successes, but
suffers from brittleness and sample inefficiency. A potential remedy is to use
a previously-trained policy as a source of supervision. In this work, we refer
to these policies as teachers and study how to transfer their expertise to new
student policies by focusing on data usage. We propose a framework, Data
CUrriculum for Reinforcement learning (DCUR), which first trains teachers using
online deep RL, and stores the logged environment interaction history. Then,
students learn by running either offline RL or by using teacher data in
combination with a small amount of self-generated data. DCUR's central idea
involves defining a class of data curricula which, as a function of training
time, limits the student to sampling from a fixed subset of the full teacher
data. We test teachers and students using state-of-the-art deep RL algorithms
across a variety of data curricula. Results suggest that the choice of data
curricula significantly impacts student learning, and that it is beneficial to
limit the data during early training stages while gradually letting the data
availability grow over time. We identify when the student can learn offline and
match teacher performance without relying on specialized offline RL algorithms.
Furthermore, we show that collecting a small fraction of online data provides
complementary benefits with the data curriculum. Supplementary material is
available at https://tinyurl.com/teach-dcur.
Related papers
- Launchpad: Learning to Schedule Using Offline and Online RL Methods [9.488752723308954]
Existing RL schedulers overlook the importance of learning from historical data and improving upon custom policies.
offline reinforcement learning presents the prospect of policy optimization from pre-recorded datasets without online environment interaction.
These methods address the challenges concerning the cost of data collection and safety, particularly pertinent to real-world applications of RL.
arXiv Detail & Related papers (2022-12-01T16:40:11Z) - Responsible Active Learning via Human-in-the-loop Peer Study [88.01358655203441]
We propose a responsible active learning method, namely Peer Study Learning (PSL), to simultaneously preserve data privacy and improve model stability.
We first introduce a human-in-the-loop teacher-student architecture to isolate unlabelled data from the task learner (teacher) on the cloud-side.
During training, the task learner instructs the light-weight active learner which then provides feedback on the active sampling criterion.
arXiv Detail & Related papers (2022-11-24T13:18:27Z) - How to Spend Your Robot Time: Bridging Kickstarting and Offline
Reinforcement Learning for Vision-based Robotic Manipulation [17.562522787934178]
Reinforcement learning (RL) has been shown to be effective at learning control from experience.
RL typically requires a large amount of online interaction with the environment.
We investigate ways to minimize online interactions in a target task, by reusing a suboptimal policy.
arXiv Detail & Related papers (2022-05-06T16:38:59Z) - When Should We Prefer Offline Reinforcement Learning Over Behavioral
Cloning? [86.43517734716606]
offline reinforcement learning (RL) algorithms can acquire effective policies by utilizing previously collected experience, without any online interaction.
behavioral cloning (BC) algorithms mimic a subset of the dataset via supervised learning.
We show that policies trained on sufficiently noisy suboptimal data can attain better performance than even BC algorithms with expert data.
arXiv Detail & Related papers (2022-04-12T08:25:34Z) - RvS: What is Essential for Offline RL via Supervised Learning? [77.91045677562802]
Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL.
In every environment suite we consider simply maximizing likelihood with two-layer feedforward is competitive.
They also probe the limits of existing RvS methods, which are comparatively weak on random data.
arXiv Detail & Related papers (2021-12-20T18:55:16Z) - Online Continual Learning with Natural Distribution Shifts: An Empirical
Study with Visual Data [101.6195176510611]
"Online" continual learning enables evaluating both information retention and online learning efficacy.
In online continual learning, each incoming small batch of data is first used for testing and then added to the training set, making the problem truly online.
We introduce a new benchmark for online continual visual learning that exhibits large scale and natural distribution shifts.
arXiv Detail & Related papers (2021-08-20T06:17:20Z) - RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning [108.9599280270704]
We propose a benchmark called RL Unplugged to evaluate and compare offline RL methods.
RL Unplugged includes data from a diverse range of domains including games and simulated motor control problems.
We will release data for all our tasks and open-source all algorithms presented in this paper.
arXiv Detail & Related papers (2020-06-24T17:14:51Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.