CoDE: Collocation for Demonstration Encoding
- URL: http://arxiv.org/abs/2105.03019v1
- Date: Fri, 7 May 2021 00:34:43 GMT
- Title: CoDE: Collocation for Demonstration Encoding
- Authors: Mandy Xie, Anqi Li, Karl Van Wyk, Frank Dellaert, Byron Boots, Nathan
Ratliff
- Abstract summary: We present a data-efficient imitation learning technique called Collocation for Demonstration.
We circumvent problematic back-propagation through time problems by introducing an auxiliary trajectory trajectory taking inspiration from collocation techniques in optimal control.
We present experiments on a 7-degree-of-freedom robotic manipulator learning behavior shaping policies for efficient tabletop operation.
- Score: 31.220899638271856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Roboticists frequently turn to Imitation learning (IL) for data efficient
policy learning. Many IL methods, canonicalized by the seminal work on Dataset
Aggregation (DAgger), combat distributional shift issues with older Behavior
Cloning (BC) methods by introducing oracle experts. Unfortunately, access to
oracle experts is often unrealistic in practice; data frequently comes from
manual offline methods such as lead-through or teleoperation. We present a
data-efficient imitation learning technique called Collocation for
Demonstration Encoding (CoDE) that operates on only a fixed set of trajectory
demonstrations by modeling learning as empirical risk minimization. We
circumvent problematic back-propagation through time problems by introducing an
auxiliary trajectory network taking inspiration from collocation techniques in
optimal control. Our method generalizes well and is much more data efficient
than standard BC methods. We present experiments on a 7-degree-of-freedom (DoF)
robotic manipulator learning behavior shaping policies for efficient tabletop
operation.
Related papers
- RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Implicit Offline Reinforcement Learning via Supervised Learning [83.8241505499762]
Offline Reinforcement Learning (RL) via Supervised Learning is a simple and effective way to learn robotic skills from a dataset collected by policies of different expertise levels.
We show how implicit models can leverage return information and match or outperform explicit algorithms to acquire robotic skills from fixed datasets.
arXiv Detail & Related papers (2022-10-21T21:59:42Z) - Model-based Offline Imitation Learning with Non-expert Data [7.615595533111191]
We propose a scalable model-based offline imitation learning algorithmic framework that leverages datasets collected by both suboptimal and optimal policies.
We show that the proposed method textitalways outperforms Behavioral Cloning in the low data regime on simulated continuous control domains.
arXiv Detail & Related papers (2022-06-11T13:08:08Z) - Mitigating Covariate Shift in Imitation Learning via Offline Data
Without Great Coverage [27.122391441921664]
This paper studies offline Imitation Learning (IL) where an agent learns to imitate an expert demonstrator without additional online environment interactions.
Instead, the learner is presented with a static offline dataset of state-action-next state transition triples from a potentially less proficient behavior policy.
We introduce Model-based IL from Offline data (MILO) to solve the offline IL problem efficiently both in theory and in practice.
arXiv Detail & Related papers (2021-06-06T18:31:08Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Human-in-the-Loop Imitation Learning using Remote Teleoperation [72.2847988686463]
We build a data collection system tailored to 6-DoF manipulation settings.
We develop an algorithm to train the policy iteratively on new data collected by the system.
We demonstrate that agents trained on data collected by our intervention-based system and algorithm outperform agents trained on an equivalent number of samples collected by non-interventional demonstrators.
arXiv Detail & Related papers (2020-12-12T05:30:35Z) - Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms.
We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
arXiv Detail & Related papers (2020-10-16T18:48:49Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.