Measuring Visual Generalization in Continuous Control from Pixels
- URL: http://arxiv.org/abs/2010.06740v2
- Date: Fri, 27 Nov 2020 20:33:03 GMT
- Title: Measuring Visual Generalization in Continuous Control from Pixels
- Authors: Jake Grigsby, Yanjun Qi
- Abstract summary: Self-supervised learning and data augmentation have significantly reduced the performance gap between state and image-based reinforcement learning agents.
We propose a benchmark that tests agents' visual generalization by adding graphical variety to existing continuous control domains.
We find that data augmentation techniques outperform self-supervised learning approaches and that more significant image transformations provide better visual generalization.
- Score: 12.598584313005407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning and data augmentation have significantly reduced the
performance gap between state and image-based reinforcement learning agents in
continuous control tasks. However, it is still unclear whether current
techniques can face a variety of visual conditions required by real-world
environments. We propose a challenging benchmark that tests agents' visual
generalization by adding graphical variety to existing continuous control
domains. Our empirical analysis shows that current methods struggle to
generalize across a diverse set of visual changes, and we examine the specific
factors of variation that make these tasks difficult. We find that data
augmentation techniques outperform self-supervised learning approaches and that
more significant image transformations provide better visual generalization
\footnote{The benchmark and our augmented actor-critic implementation are
open-sourced @ https://github.com/QData/dmc_remastered)
Related papers
- A Simple Background Augmentation Method for Object Detection with Diffusion Model [53.32935683257045]
In computer vision, it is well-known that a lack of data diversity will impair model performance.
We propose a simple yet effective data augmentation approach by leveraging advancements in generative models.
Background augmentation, in particular, significantly improves the models' robustness and generalization capabilities.
arXiv Detail & Related papers (2024-08-01T07:40:00Z) - Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control [73.6361029556484]
Embodied AI agents require a fine-grained understanding of the physical world mediated through visual and language inputs.
We consider pre-trained text-to-image diffusion models, which are explicitly optimized to generate images from text prompts.
We show that Stable Control Representations enable learning policies that exhibit state-of-the-art performance on OVMM, a difficult open-vocabulary navigation benchmark.
arXiv Detail & Related papers (2024-05-09T15:39:54Z) - Single-temporal Supervised Remote Change Detection for Domain Generalization [42.55492600157288]
Change detection is widely applied in remote sensing image analysis.
Existing methods require training models separately for each dataset.
We propose a multimodal contrastive learning (ChangeCLIP) based on visual-labelled pre-training for change detection domain generalization.
arXiv Detail & Related papers (2024-04-17T12:38:58Z) - Generalization Gap in Data Augmentation: Insights from Illumination [3.470401787749558]
We investigate the differences in generalization between models trained with augmented data and those trained under real-world illumination conditions.
Results indicate that after applying various data augmentation methods, model performance has significantly improved.
Yet, a noticeable generalization gap still exists after utilizing various data augmentation methods.
arXiv Detail & Related papers (2024-04-11T07:11:43Z) - What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - Invariance is Key to Generalization: Examining the Role of
Representation in Sim-to-Real Transfer for Visual Navigation [35.01394611106655]
Key to generalization is representations that are rich enough to capture all task-relevant information.
We experimentally study such a representation for visual navigation.
We show that our representation reduces the A-distance between the training and test domains.
arXiv Detail & Related papers (2023-10-23T15:15:19Z) - Towards Generic Image Manipulation Detection with Weakly-Supervised
Self-Consistency Learning [49.43362803584032]
We propose weakly-supervised image manipulation detection.
Such a setting can leverage more training images and has the potential to adapt quickly to new manipulation techniques.
Two consistency properties are learned: multi-source consistency (MSC) and inter-patch consistency (IPC)
arXiv Detail & Related papers (2023-09-03T19:19:56Z) - Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images
with Free Attention Masks [64.67735676127208]
Text-to-image diffusion models have shown great potential for benefiting image recognition.
Although promising, there has been inadequate exploration dedicated to unsupervised learning on diffusion-generated images.
We introduce customized solutions by fully exploiting the aforementioned free attention masks.
arXiv Detail & Related papers (2023-08-13T10:07:46Z) - VIBR: Learning View-Invariant Value Functions for Robust Visual Control [3.2307366446033945]
VIBR (View-Invariant Bellman Residuals) is a method that combines multi-view training and invariant prediction to reduce out-of-distribution gap for RL based visuomotor control.
We show that VIBR outperforms existing methods on complex visuo-motor control environment with high visual perturbation.
arXiv Detail & Related papers (2023-06-14T14:37:34Z) - Task Formulation Matters When Learning Continually: A Case Study in
Visual Question Answering [58.82325933356066]
Continual learning aims to train a model incrementally on a sequence of tasks without forgetting previous knowledge.
We present a detailed study of how different settings affect performance for Visual Question Answering.
arXiv Detail & Related papers (2022-09-30T19:12:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.