The Distracting Control Suite -- A Challenging Benchmark for
Reinforcement Learning from Pixels
- URL: http://arxiv.org/abs/2101.02722v1
- Date: Thu, 7 Jan 2021 19:03:34 GMT
- Title: The Distracting Control Suite -- A Challenging Benchmark for
Reinforcement Learning from Pixels
- Authors: Austin Stone, Oscar Ramirez, Kurt Konolige, Rico Jonschkowski
- Abstract summary: We extend DM Control with three kinds of visual distractions to produce a new challenging benchmark for vision-based control.
Our experiments show that current RL methods for vision-based control perform poorly under distractions.
We also find that combinations of multiple distraction types are more difficult than a mere combination of their individual effects.
- Score: 10.727930028878516
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Robots have to face challenging perceptual settings, including changes in
viewpoint, lighting, and background. Current simulated reinforcement learning
(RL) benchmarks such as DM Control provide visual input without such
complexity, which limits the transfer of well-performing methods to the real
world. In this paper, we extend DM Control with three kinds of visual
distractions (variations in background, color, and camera pose) to produce a
new challenging benchmark for vision-based control, and we analyze state of the
art RL algorithms in these settings. Our experiments show that current RL
methods for vision-based control perform poorly under distractions, and that
their performance decreases with increasing distraction complexity, showing
that new methods are needed to cope with the visual complexities of the real
world. We also find that combinations of multiple distraction types are more
difficult than a mere combination of their individual effects.
Related papers
- Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning [3.8309622155866583]
We introduce the Sliding Puzzles Gym (SPGym), a novel benchmark that reimagines the classic 8-tile puzzle with a visual observation space of images sourced from arbitrarily large datasets.
SPGym provides precise control over representation complexity through visual diversity, allowing researchers to systematically scale the representation learning challenge.
As we increase visual diversity by expanding the pool of possible images, all tested algorithms show significant performance degradation.
arXiv Detail & Related papers (2024-10-17T21:23:03Z) - An Examination of Offline-Trained Encoders in Vision-Based Deep Reinforcement Learning for Autonomous Driving [0.0]
Research investigates the challenges Deep Reinforcement Learning (DRL) faces in Partially Observable Markov Decision Processes (POMDP)
Our research adopts an offline-trained encoder to leverage large video datasets through self-supervised learning to learn generalizable representations.
We show that the features learned by watching BDD100K driving videos can be directly transferred to achieve lane following and collision avoidance in CARLA simulator.
arXiv Detail & Related papers (2024-09-02T14:16:23Z) - PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models [55.080748327139176]
We introduce PerLDiff, a method for effective street view image generation that fully leverages perspective 3D geometric information.
Our results justify that our PerLDiff markedly enhances the precision of generation on the NuScenes and KITTI datasets.
arXiv Detail & Related papers (2024-07-08T16:46:47Z) - DEAR: Disentangled Environment and Agent Representations for Reinforcement Learning without Reconstruction [4.813546138483559]
Reinforcement Learning (RL) algorithms can learn robotic control tasks from visual observations, but they often require a large amount of data.
In this paper, we explore how the agent's knowledge of its shape can improve the sample efficiency of visual RL methods.
We propose a novel method, Disentangled Environment and Agent Representations, that uses the segmentation mask of the agent as supervision.
arXiv Detail & Related papers (2024-06-30T09:15:21Z) - M2CURL: Sample-Efficient Multimodal Reinforcement Learning via Self-Supervised Representation Learning for Robotic Manipulation [0.7564784873669823]
We propose Multimodal Contrastive Unsupervised Reinforcement Learning (M2CURL)
Our approach employs a novel multimodal self-supervised learning technique that learns efficient representations and contributes to faster convergence of RL algorithms.
We evaluate M2CURL on the Tactile Gym 2 simulator and we show that it significantly enhances the learning efficiency in different manipulation tasks.
arXiv Detail & Related papers (2024-01-30T14:09:35Z) - VIBR: Learning View-Invariant Value Functions for Robust Visual Control [3.2307366446033945]
VIBR (View-Invariant Bellman Residuals) is a method that combines multi-view training and invariant prediction to reduce out-of-distribution gap for RL based visuomotor control.
We show that VIBR outperforms existing methods on complex visuo-motor control environment with high visual perturbation.
arXiv Detail & Related papers (2023-06-14T14:37:34Z) - Accelerating exploration and representation learning with offline
pre-training [52.6912479800592]
We show that exploration and representation learning can be improved by separately learning two different models from a single offline dataset.
We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward can significantly improve the sample efficiency on the challenging NetHack benchmark.
arXiv Detail & Related papers (2023-03-31T18:03:30Z) - Challenges and Opportunities in Offline Reinforcement Learning from
Visual Observations [58.758928936316785]
offline reinforcement learning from visual observations with continuous action spaces remains under-explored.
We show that modifications to two popular vision-based online reinforcement learning algorithms suffice to outperform existing offline RL methods.
arXiv Detail & Related papers (2022-06-09T22:08:47Z) - Unlocking Pixels for Reinforcement Learning via Implicit Attention [61.666538764049854]
We make use of new efficient attention algorithms, recently shown to be highly effective for Transformers.
This allows our attention-based controllers to scale to larger visual inputs, and facilitate the use of smaller patches.
In addition, we propose a new efficient algorithm approximating softmax attention with what we call hybrid random features.
arXiv Detail & Related papers (2021-02-08T17:00:26Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z) - Semi-Supervised StyleGAN for Disentanglement Learning [79.01988132442064]
Current disentanglement methods face several inherent limitations.
We design new architectures and loss functions based on StyleGAN for semi-supervised high-resolution disentanglement learning.
arXiv Detail & Related papers (2020-03-06T22:54:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.