Hierarchical Variational Imitation Learning of Control Programs
- URL: http://arxiv.org/abs/1912.12612v1
- Date: Sun, 29 Dec 2019 08:57:02 GMT
- Title: Hierarchical Variational Imitation Learning of Control Programs
- Authors: Roy Fox, Richard Shin, William Paul, Yitian Zou, Dawn Song, Ken
Goldberg, Pieter Abbeel, Ion Stoica
- Abstract summary: We propose a variational inference method for imitation learning of a control policy represented by parametrized hierarchical procedures (PHP)
Our method discovers the hierarchical structure in a dataset of observation-action traces of teacher demonstrations, by learning an approximate posterior distribution over the latent sequence of procedure calls and terminations.
We demonstrate a novel benefit of variational inference in the context of hierarchical imitation learning: in decomposing the policy into simpler procedures, inference can leverage acausal information that is unused by other methods.
- Score: 131.7671843857375
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous agents can learn by imitating teacher demonstrations of the
intended behavior. Hierarchical control policies are ubiquitously useful for
such learning, having the potential to break down structured tasks into simpler
sub-tasks, thereby improving data efficiency and generalization. In this paper,
we propose a variational inference method for imitation learning of a control
policy represented by parametrized hierarchical procedures (PHP), a
program-like structure in which procedures can invoke sub-procedures to perform
sub-tasks. Our method discovers the hierarchical structure in a dataset of
observation-action traces of teacher demonstrations, by learning an approximate
posterior distribution over the latent sequence of procedure calls and
terminations. Samples from this learned distribution then guide the training of
the hierarchical control policy. We identify and demonstrate a novel benefit of
variational inference in the context of hierarchical imitation learning: in
decomposing the policy into simpler procedures, inference can leverage acausal
information that is unused by other methods. Training PHP with variational
inference outperforms LSTM baselines in terms of data efficiency and
generalization, requiring less than half as much data to achieve a 24% error
rate in executing the bubble sort algorithm, and to achieve no error in
executing Karel programs.
Related papers
- Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - On minimal variations for unsupervised representation learning [19.055611167696238]
Unsupervised representation learning aims at describing raw data efficiently to solve various downstream tasks.
Revealing minimal variations as a guiding principle behind unsupervised representation learning paves the way to better practical guidelines for self-supervised learning algorithms.
arXiv Detail & Related papers (2022-11-07T18:57:20Z) - Robust Task Representations for Offline Meta-Reinforcement Learning via
Contrastive Learning [21.59254848913971]
offline meta-reinforcement learning is a reinforcement learning paradigm that learns from offline data to adapt to new tasks.
We propose a contrastive learning framework for task representations that are robust to the distribution of behavior policies in training and test.
Experiments on a variety of offline meta-reinforcement learning benchmarks demonstrate the advantages of our method over prior methods.
arXiv Detail & Related papers (2022-06-21T14:46:47Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Learning to Synthesize Programs as Interpretable and Generalizable
Policies [25.258598215642067]
We present a framework that learns to synthesize a program, which details the procedure to solve a task in a flexible and expressive manner.
Experimental results demonstrate that the proposed framework not only learns to reliably synthesize task-solving programs but also outperforms DRL and program synthesis baselines.
arXiv Detail & Related papers (2021-08-31T07:03:06Z) - How Fine-Tuning Allows for Effective Meta-Learning [50.17896588738377]
We present a theoretical framework for analyzing representations derived from a MAML-like algorithm.
We provide risk bounds on the best predictor found by fine-tuning via gradient descent, demonstrating that the algorithm can provably leverage the shared structure.
This separation result underscores the benefit of fine-tuning-based methods, such as MAML, over methods with "frozen representation" objectives in few-shot learning.
arXiv Detail & Related papers (2021-05-05T17:56:00Z) - Conditional Meta-Learning of Linear Representations [57.90025697492041]
Standard meta-learning for representation learning aims to find a common representation to be shared across multiple tasks.
In this work we overcome this issue by inferring a conditioning function, mapping the tasks' side information into a representation tailored to the task at hand.
We propose a meta-algorithm capable of leveraging this advantage in practice.
arXiv Detail & Related papers (2021-03-30T12:02:14Z) - META-Learning Eligibility Traces for More Sample Efficient Temporal
Difference Learning [2.0559497209595823]
We propose a meta-learning method for adjusting the eligibility trace parameter, in a state-dependent manner.
The adaptation is achieved with the help of auxiliary learners that learn distributional information about the update targets online.
We prove that, under some assumptions, the proposed method improves the overall quality of the update targets, by minimizing the overall target error.
arXiv Detail & Related papers (2020-06-16T03:41:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.