TRAIL: Near-Optimal Imitation Learning with Suboptimal Data
- URL: http://arxiv.org/abs/2110.14770v1
- Date: Wed, 27 Oct 2021 21:05:00 GMT
- Title: TRAIL: Near-Optimal Imitation Learning with Suboptimal Data
- Authors: Mengjiao Yang, Sergey Levine, Ofir Nachum
- Abstract summary: We present training objectives that use offline datasets to learn a factored transition model.
Our theoretical analysis shows that the learned latent action space can boost the sample-efficiency of downstream imitation learning.
To learn the latent action space in practice, we propose TRAIL (Transition-Reparametrized Actions for Imitation Learning), an algorithm that learns an energy-based transition model.
- Score: 100.83688818427915
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The aim in imitation learning is to learn effective policies by utilizing
near-optimal expert demonstrations. However, high-quality demonstrations from
human experts can be expensive to obtain in large numbers. On the other hand,
it is often much easier to obtain large quantities of suboptimal or
task-agnostic trajectories, which are not useful for direct imitation, but can
nevertheless provide insight into the dynamical structure of the environment,
showing what could be done in the environment even if not what should be done.
We ask the question, is it possible to utilize such suboptimal offline datasets
to facilitate provably improved downstream imitation learning? In this work, we
answer this question affirmatively and present training objectives that use
offline datasets to learn a factored transition model whose structure enables
the extraction of a latent action space. Our theoretical analysis shows that
the learned latent action space can boost the sample-efficiency of downstream
imitation learning, effectively reducing the need for large near-optimal expert
datasets through the use of auxiliary non-expert data. To learn the latent
action space in practice, we propose TRAIL (Transition-Reparametrized Actions
for Imitation Learning), an algorithm that learns an energy-based transition
model contrastively, and uses the transition model to reparametrize the action
space for sample-efficient imitation learning. We evaluate the practicality of
our objective through experiments on a set of navigation and locomotion tasks.
Our results verify the benefits suggested by our theory and show that TRAIL is
able to improve baseline imitation learning by up to 4x in performance.
Related papers
- KBAlign: Efficient Self Adaptation on Specific Knowledge Bases [75.78948575957081]
Large language models (LLMs) usually rely on retrieval-augmented generation to exploit knowledge materials in an instant manner.
We propose KBAlign, an approach designed for efficient adaptation to downstream tasks involving knowledge bases.
Our method utilizes iterative training with self-annotated data such as Q&A pairs and revision suggestions, enabling the model to grasp the knowledge content efficiently.
arXiv Detail & Related papers (2024-11-22T08:21:03Z) - Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA)
Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning.
We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - Perception Without Vision for Trajectory Prediction: Ego Vehicle Dynamics as Scene Representation for Efficient Active Learning in Autonomous Driving [0.0]
We propose methods for clustering trajectory-states and sampling strategies in an active learning framework.
By integrating trajectory-state-informed active learning, we demonstrate that more efficient and robust autonomous driving systems are possible.
arXiv Detail & Related papers (2024-05-15T02:54:11Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Reinforcement Learning from Passive Data via Latent Intentions [86.4969514480008]
We show that passive data can still be used to learn features that accelerate downstream RL.
Our approach learns from passive data by modeling intentions.
Our experiments demonstrate the ability to learn from many forms of passive data, including cross-embodiment video data and YouTube videos.
arXiv Detail & Related papers (2023-04-10T17:59:05Z) - How To Guide Your Learner: Imitation Learning with Active Adaptive
Expert Involvement [20.91491585498749]
We propose a novel active imitation learning framework based on a teacher-student interaction model.
We show that AdapMen can improve the error bound and avoid compounding error under mild conditions.
arXiv Detail & Related papers (2023-03-03T16:44:33Z) - An Exploration of Data Efficiency in Intra-Dataset Task Transfer for
Dialog Understanding [65.75873687351553]
This study explores the effects of varying quantities of target task training data on sequential transfer learning in the dialog domain.
Unintuitively, our data shows that often target task training data size has minimal effect on how sequential transfer learning performs compared to the same model without transfer learning.
arXiv Detail & Related papers (2022-10-21T04:36:46Z) - Bridging the Last Mile in Sim-to-Real Robot Perception via Bayesian
Active Learning [34.910660020436424]
We propose a pipeline that relies on deep Bayesian active learning and aims to minimize the manual annotation efforts.
In our experiments on two object detectiondata sets, we show that the labeling effort required to bridge thereality gap can be reduced to a small amount.
arXiv Detail & Related papers (2021-09-23T14:45:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.