Reinforcement Learning with Action-Free Pre-Training from Videos
- URL: http://arxiv.org/abs/2203.13880v1
- Date: Fri, 25 Mar 2022 19:44:09 GMT
- Title: Reinforcement Learning with Action-Free Pre-Training from Videos
- Authors: Younggyo Seo, Kimin Lee, Stephen James, Pieter Abbeel
- Abstract summary: We introduce a framework that learns representations useful for understanding the dynamics via generative pre-training on videos.
Our framework significantly improves both final performances and sample-efficiency of vision-based reinforcement learning.
- Score: 95.25074614579646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent unsupervised pre-training methods have shown to be effective on
language and vision domains by learning useful representations for multiple
downstream tasks. In this paper, we investigate if such unsupervised
pre-training methods can also be effective for vision-based reinforcement
learning (RL). To this end, we introduce a framework that learns
representations useful for understanding the dynamics via generative
pre-training on videos. Our framework consists of two phases: we pre-train an
action-free latent video prediction model, and then utilize the pre-trained
representations for efficiently learning action-conditional world models on
unseen environments. To incorporate additional action inputs during
fine-tuning, we introduce a new architecture that stacks an action-conditional
latent prediction model on top of the pre-trained action-free prediction model.
Moreover, for better exploration, we propose a video-based intrinsic bonus that
leverages pre-trained representations. We demonstrate that our framework
significantly improves both final performances and sample-efficiency of
vision-based RL in a variety of manipulation and locomotion tasks. Code is
available at https://github.com/younggyoseo/apv.
Related papers
- Pre-trained Visual Dynamics Representations for Efficient Policy Learning [33.62440075940917]
We propose Pre-trained Visual Dynamics Representations (PVDR) to bridge the domain gap between videos and downstream tasks for efficient policy learning.
The pre-trained visual dynamics representations capture the visual dynamics prior knowledge in the videos.
This abstract prior knowledge can be readily adapted to downstream tasks and aligned with executable actions through online adaptation.
arXiv Detail & Related papers (2024-11-05T15:18:02Z) - Value Explicit Pretraining for Learning Transferable Representations [11.069853883599102]
We propose a method that learns generalizable representations for transfer reinforcement learning.
We learn new tasks that share similar objectives as previously learned tasks, by learning an encoder for objective-conditioned representations.
Experiments using a realistic navigation simulator and Atari benchmark show that the pretrained encoder produced by our method outperforms current SoTA pretraining methods.
arXiv Detail & Related papers (2023-12-19T17:12:35Z) - PILOT: A Pre-Trained Model-Based Continual Learning Toolbox [71.63186089279218]
This paper introduces a pre-trained model-based continual learning toolbox known as PILOT.
On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt.
On the other hand, PILOT fits typical class-incremental learning algorithms within the context of pre-trained models to evaluate their effectiveness.
arXiv Detail & Related papers (2023-09-13T17:55:11Z) - GPT4Image: Can Large Pre-trained Models Help Vision Models on Perception
Tasks? [51.22096780511165]
We present a new learning paradigm in which the knowledge extracted from large pre-trained models are utilized to help models like CNN and ViT learn enhanced representations.
We feed detailed descriptions into a pre-trained encoder to extract text embeddings with rich semantic information that encodes the content of images.
arXiv Detail & Related papers (2023-06-01T14:02:45Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - REST: REtrieve & Self-Train for generative action recognition [54.90704746573636]
We propose to adapt a pre-trained generative Vision & Language (V&L) Foundation Model for video/action recognition.
We show that direct fine-tuning of a generative model to produce action classes suffers from severe overfitting.
We introduce REST, a training framework consisting of two key components.
arXiv Detail & Related papers (2022-09-29T17:57:01Z) - Action-Conditioned Contrastive Policy Pretraining [39.13710045468429]
Deep visuomotor policy learning achieves promising results in control tasks such as robotic manipulation and autonomous driving.
It requires a huge number of online interactions with the training environment, which limits its real-world application.
In this work, we aim to pretrain policy representations for driving tasks using hours-long uncurated YouTube videos.
arXiv Detail & Related papers (2022-04-05T17:58:22Z) - Learning Actor-centered Representations for Action Localization in
Streaming Videos using Predictive Learning [18.757368441841123]
Event perception tasks such as recognizing and localizing actions in streaming videos are essential for tackling visual understanding tasks.
We tackle the problem of learning textitactor-centered representations through the notion of continual hierarchical predictive learning.
Inspired by cognitive theories of event perception, we propose a novel, self-supervised framework.
arXiv Detail & Related papers (2021-04-29T06:06:58Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.