Video Prediction Models as Rewards for Reinforcement Learning
- URL: http://arxiv.org/abs/2305.14343v2
- Date: Tue, 30 May 2023 17:38:44 GMT
- Title: Video Prediction Models as Rewards for Reinforcement Learning
- Authors: Alejandro Escontrela and Ademi Adeniji and Wilson Yan and Ajay Jain
and Xue Bin Peng and Ken Goldberg and Youngwoon Lee and Danijar Hafner and
Pieter Abbeel
- Abstract summary: VIPER is an algorithm that leverages pretrained video prediction models as action-free reward signals for reinforcement learning.
We see our work as starting point for scalable reward specification from unlabeled videos.
- Score: 127.53893027811027
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Specifying reward signals that allow agents to learn complex behaviors is a
long-standing challenge in reinforcement learning. A promising approach is to
extract preferences for behaviors from unlabeled videos, which are widely
available on the internet. We present Video Prediction Rewards (VIPER), an
algorithm that leverages pretrained video prediction models as action-free
reward signals for reinforcement learning. Specifically, we first train an
autoregressive transformer on expert videos and then use the video prediction
likelihoods as reward signals for a reinforcement learning agent. VIPER enables
expert-level control without programmatic task rewards across a wide range of
DMC, Atari, and RLBench tasks. Moreover, generalization of the video prediction
model allows us to derive rewards for an out-of-distribution environment where
no expert data is available, enabling cross-embodiment generalization for
tabletop manipulation. We see our work as starting point for scalable reward
specification from unlabeled videos that will benefit from the rapid advances
in generative modeling. Source code and datasets are available on the project
website: https://escontrela.me/viper
Related papers
- An Empirical Study of Autoregressive Pre-training from Videos [67.15356613065542]
We treat videos as visual tokens and train transformer models to autoregressively predict future tokens.
Our models are pre-trained on a diverse dataset of videos and images comprising over 1 trillion visual tokens.
Our results demonstrate that, despite minimal inductive biases, autoregressive pre-training leads to competitive performance.
arXiv Detail & Related papers (2025-01-09T18:59:58Z) - Video2Reward: Generating Reward Function from Videos for Legged Robot Behavior Learning [27.233232260388682]
We introduce a new video2reward method, which directly generates reward functions from videos depicting the behaviors to be mimicked and learned.
Our method surpasses the performance of state-of-the-art LLM-based reward generation methods by over 37.6% in terms of human normalized score.
arXiv Detail & Related papers (2024-12-07T03:10:27Z) - Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - Video Pre-trained Transformer: A Multimodal Mixture of Pre-trained
Experts [2.457872341625575]
We present Video Pre-trained Transformer.
It uses four SOTA encoder models to convert a video into a sequence of compact embeddings.
It learns using an autoregressive causal language modeling loss by predicting the words spoken in YouTube videos.
arXiv Detail & Related papers (2023-03-24T17:18:40Z) - Reinforcement Learning with Action-Free Pre-Training from Videos [95.25074614579646]
We introduce a framework that learns representations useful for understanding the dynamics via generative pre-training on videos.
Our framework significantly improves both final performances and sample-efficiency of vision-based reinforcement learning.
arXiv Detail & Related papers (2022-03-25T19:44:09Z) - Reward prediction for representation learning and reward shaping [0.8883733362171032]
We propose learning a state representation in a self-supervised manner for reward prediction.
We augment the training of out-of-the-box RL agents by shaping the reward using our reward predictor during policy learning.
arXiv Detail & Related papers (2021-05-07T11:29:32Z) - Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts [89.06560404218028]
We introduce a new method for pre-training video action recognition models using queried web videos.
Instead of trying to filter out, we propose to convert the potential noises in these queried videos to useful supervision signals.
We show that SPL outperforms several existing pre-training strategies using pseudo-labels.
arXiv Detail & Related papers (2021-01-11T05:50:16Z) - Semi-supervised reward learning for offline reinforcement learning [71.6909757718301]
Training agents usually requires reward functions, but rewards are seldom available in practice and their engineering is challenging and laborious.
We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data.
In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards.
arXiv Detail & Related papers (2020-12-12T20:06:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.