Disentangling Controllable Object through Video Prediction Improves
Visual Reinforcement Learning
- URL: http://arxiv.org/abs/2002.09136v1
- Date: Fri, 21 Feb 2020 05:43:34 GMT
- Title: Disentangling Controllable Object through Video Prediction Improves
Visual Reinforcement Learning
- Authors: Yuanyi Zhong, Alexander Schwing, Jian Peng
- Abstract summary: In many vision-based reinforcement learning problems, the agent controls a movable object in its visual field.
We propose an end-to-end learning framework to disentangle the controllable object from the observation signal.
The disentangled representation is shown to be useful for RL as additional observation channels to the agent.
- Score: 82.25034245150582
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many vision-based reinforcement learning (RL) problems, the agent controls
a movable object in its visual field, e.g., the player's avatar in video games
and the robotic arm in visual grasping and manipulation. Leveraging
action-conditioned video prediction, we propose an end-to-end learning
framework to disentangle the controllable object from the observation signal.
The disentangled representation is shown to be useful for RL as additional
observation channels to the agent. Experiments on a set of Atari games with the
popular Double DQN algorithm demonstrate improved sample efficiency and game
performance (from 222.8% to 261.4% measured in normalized game scores, with
prediction bonus reward).
Related papers
- What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - Video Prediction Models as Rewards for Reinforcement Learning [127.53893027811027]
VIPER is an algorithm that leverages pretrained video prediction models as action-free reward signals for reinforcement learning.
We see our work as starting point for scalable reward specification from unlabeled videos.
arXiv Detail & Related papers (2023-05-23T17:59:33Z) - Reinforcement Learning with Action-Free Pre-Training from Videos [95.25074614579646]
We introduce a framework that learns representations useful for understanding the dynamics via generative pre-training on videos.
Our framework significantly improves both final performances and sample-efficiency of vision-based reinforcement learning.
arXiv Detail & Related papers (2022-03-25T19:44:09Z) - Unsupervised Visual Representation Learning by Tracking Patches in Video [88.56860674483752]
We propose to use tracking as a proxy task for a computer vision system to learn the visual representations.
Modelled on the Catch game played by the children, we design a Catch-the-Patch (CtP) game for a 3D-CNN model to learn visual representations.
arXiv Detail & Related papers (2021-05-06T09:46:42Z) - An Empirical Study on the Generalization Power of Neural Representations
Learned via Visual Guessing Games [79.23847247132345]
This work investigates how well an artificial agent can benefit from playing guessing games when later asked to perform on novel NLP downstream tasks such as Visual Question Answering (VQA)
We propose two ways to exploit playing guessing games: 1) a supervised learning scenario in which the agent learns to mimic successful guessing games and 2) a novel way for an agent to play by itself, called Self-play via Iterated Experience Learning (SPIEL)
arXiv Detail & Related papers (2021-01-31T10:30:48Z) - ROLL: Visual Self-Supervised Reinforcement Learning with Object
Reasoning [16.18256739680704]
Current reinforcement learning algorithms operate on the whole image without performing object-level reasoning.
In this paper, we improve upon previous visual self-supervised RL by incorporating object-level reasoning and occlusion reasoning.
Our proposed algorithm, ROLL, learns dramatically faster and better final performance compared with previous methods in several simulated visual control tasks.
arXiv Detail & Related papers (2020-11-13T06:21:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.