Disentangling Controllable Object through Video Prediction Improves
Visual Reinforcement Learning
- URL: http://arxiv.org/abs/2002.09136v1
- Date: Fri, 21 Feb 2020 05:43:34 GMT
- Title: Disentangling Controllable Object through Video Prediction Improves
Visual Reinforcement Learning
- Authors: Yuanyi Zhong, Alexander Schwing, Jian Peng
- Abstract summary: In many vision-based reinforcement learning problems, the agent controls a movable object in its visual field.
We propose an end-to-end learning framework to disentangle the controllable object from the observation signal.
The disentangled representation is shown to be useful for RL as additional observation channels to the agent.
- Score: 82.25034245150582
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many vision-based reinforcement learning (RL) problems, the agent controls
a movable object in its visual field, e.g., the player's avatar in video games
and the robotic arm in visual grasping and manipulation. Leveraging
action-conditioned video prediction, we propose an end-to-end learning
framework to disentangle the controllable object from the observation signal.
The disentangled representation is shown to be useful for RL as additional
observation channels to the agent. Experiments on a set of Atari games with the
popular Double DQN algorithm demonstrate improved sample efficiency and game
performance (from 222.8% to 261.4% measured in normalized game scores, with
prediction bonus reward).
Related papers
- Learning Representations in Video Game Agents with Supervised Contrastive Imitation Learning [0.6299766708197881]
This paper introduces a novel application of Supervised Contrastive Learning (SupCon) to Imitation Learning (IL)<n>The goal is to obtain latent representations of the observations that capture better the action-relevant factors.<n> Experiments on the 3D games Astro Bot and Returnal, and multiple 2D Atari games show improved representation quality, faster learning convergence, and better generalization.
arXiv Detail & Related papers (2025-09-15T13:00:29Z) - Open-World Drone Active Tracking with Goal-Centered Rewards [62.21394499788672]
Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations.<n>We propose DAT, the first open-world drone active air-to-ground tracking benchmark.<n>We also propose GC-VAT, which aims to improve the performance of drone tracking targets in complex scenarios.
arXiv Detail & Related papers (2024-12-01T09:37:46Z) - What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - Video Prediction Models as Rewards for Reinforcement Learning [127.53893027811027]
VIPER is an algorithm that leverages pretrained video prediction models as action-free reward signals for reinforcement learning.
We see our work as starting point for scalable reward specification from unlabeled videos.
arXiv Detail & Related papers (2023-05-23T17:59:33Z) - Reinforcement Learning with Action-Free Pre-Training from Videos [95.25074614579646]
We introduce a framework that learns representations useful for understanding the dynamics via generative pre-training on videos.
Our framework significantly improves both final performances and sample-efficiency of vision-based reinforcement learning.
arXiv Detail & Related papers (2022-03-25T19:44:09Z) - Unsupervised Visual Representation Learning by Tracking Patches in Video [88.56860674483752]
We propose to use tracking as a proxy task for a computer vision system to learn the visual representations.
Modelled on the Catch game played by the children, we design a Catch-the-Patch (CtP) game for a 3D-CNN model to learn visual representations.
arXiv Detail & Related papers (2021-05-06T09:46:42Z) - An Empirical Study on the Generalization Power of Neural Representations
Learned via Visual Guessing Games [79.23847247132345]
This work investigates how well an artificial agent can benefit from playing guessing games when later asked to perform on novel NLP downstream tasks such as Visual Question Answering (VQA)
We propose two ways to exploit playing guessing games: 1) a supervised learning scenario in which the agent learns to mimic successful guessing games and 2) a novel way for an agent to play by itself, called Self-play via Iterated Experience Learning (SPIEL)
arXiv Detail & Related papers (2021-01-31T10:30:48Z) - ROLL: Visual Self-Supervised Reinforcement Learning with Object
Reasoning [16.18256739680704]
Current reinforcement learning algorithms operate on the whole image without performing object-level reasoning.
In this paper, we improve upon previous visual self-supervised RL by incorporating object-level reasoning and occlusion reasoning.
Our proposed algorithm, ROLL, learns dramatically faster and better final performance compared with previous methods in several simulated visual control tasks.
arXiv Detail & Related papers (2020-11-13T06:21:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.