Pre-Trained Image Encoder for Generalizable Visual Reinforcement
Learning
- URL: http://arxiv.org/abs/2212.08860v1
- Date: Sat, 17 Dec 2022 12:45:08 GMT
- Title: Pre-Trained Image Encoder for Generalizable Visual Reinforcement
Learning
- Authors: Zhecheng Yuan, Zhengrong Xue, Bo Yuan, Xueqian Wang, Yi Wu, Yang Gao,
Huazhe Xu
- Abstract summary: We propose Pre-trained Image for Generalizable visual reinforcement learning (PIE-G)
PIE-G is a simple yet effective framework that can generalize to the unseen visual scenarios in a zero-shot manner.
Empirical evidence suggests PIE-G improves sample efficiency and significantly outperforms previous state-of-the-art methods in terms of generalization performance.
- Score: 27.304282924423095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning generalizable policies that can adapt to unseen environments remains
challenging in visual Reinforcement Learning (RL). Existing approaches try to
acquire a robust representation via diversifying the appearances of in-domain
observations for better generalization. Limited by the specific observations of
the environment, these methods ignore the possibility of exploring diverse
real-world image datasets. In this paper, we investigate how a visual RL agent
would benefit from the off-the-shelf visual representations. Surprisingly, we
find that the early layers in an ImageNet pre-trained ResNet model could
provide rather generalizable representations for visual RL. Hence, we propose
Pre-trained Image Encoder for Generalizable visual reinforcement learning
(PIE-G), a simple yet effective framework that can generalize to the unseen
visual scenarios in a zero-shot manner. Extensive experiments are conducted on
DMControl Generalization Benchmark, DMControl Manipulation Tasks, Drawer World,
and CARLA to verify the effectiveness of PIE-G. Empirical evidence suggests
PIE-G improves sample efficiency and significantly outperforms previous
state-of-the-art methods in terms of generalization performance. In particular,
PIE-G boasts a 55% generalization performance gain on average in the
challenging video background setting. Project Page:
https://sites.google.com/view/pie-g/home.
Related papers
- Adaptive Masking Enhances Visual Grounding [12.793586888511978]
We propose IMAGE, Interpretative MAsking with Gaussian radiation modEling, to enhance vocabulary grounding in low-shot learning scenarios.
We evaluate the efficacy of our approach on benchmark datasets, including COCO and ODinW, demonstrating its superior performance in zero-shot and few-shot tasks.
arXiv Detail & Related papers (2024-10-04T05:48:02Z) - Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control [73.6361029556484]
Embodied AI agents require a fine-grained understanding of the physical world mediated through visual and language inputs.
We consider pre-trained text-to-image diffusion models, which are explicitly optimized to generate images from text prompts.
We show that Stable Control Representations enable learning policies that exhibit state-of-the-art performance on OVMM, a difficult open-vocabulary navigation benchmark.
arXiv Detail & Related papers (2024-05-09T15:39:54Z) - Improving Generalization via Meta-Learning on Hard Samples [8.96835934244022]
We show that using hard-to-classify instances in the validation set has both a theoretical connection to, and strong empirical evidence of generalization.
We provide an efficient algorithm for training this meta-optimized model, as well as a simple train-twice for careful comparative study.
arXiv Detail & Related papers (2024-03-18T20:33:44Z) - Appearance Debiased Gaze Estimation via Stochastic Subject-Wise
Adversarial Learning [33.55397868171977]
Appearance-based gaze estimation has been attracting attention in computer vision, and remarkable improvements have been achieved using various deep learning techniques.
We propose a novel framework: subject-wise gaZE learning (SAZE), which trains a network to generalize the appearance of subjects.
Our experimental results verify the robustness of the method in that it yields state-of-the-art performance, achieving 3.89 and 4.42 on the MPIIGaze and EyeDiap datasets, respectively.
arXiv Detail & Related papers (2024-01-25T00:23:21Z) - What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Learning Task-relevant Representations for Generalization via
Characteristic Functions of Reward Sequence Distributions [63.773813221460614]
Generalization across different environments with the same tasks is critical for successful applications of visual reinforcement learning.
We propose a novel approach, namely Characteristic Reward Sequence Prediction (CRESP), to extract the task-relevant information.
Experiments demonstrate that CRESP significantly improves the performance of generalization on unseen environments.
arXiv Detail & Related papers (2022-05-20T14:52:03Z) - Reinforcement Learning with Action-Free Pre-Training from Videos [95.25074614579646]
We introduce a framework that learns representations useful for understanding the dynamics via generative pre-training on videos.
Our framework significantly improves both final performances and sample-efficiency of vision-based reinforcement learning.
arXiv Detail & Related papers (2022-03-25T19:44:09Z) - More is Better: A Novel Multi-view Framework for Domain Generalization [28.12350681444117]
Key issue of domain generalization (DG) is how to prevent overfitting to the observed source domains.
By treating tasks and images as different views, we propose a novel multi-view DG framework.
In test stage, to alleviate unstable prediction, we utilize multiple augmented images to yield multi-view prediction.
arXiv Detail & Related papers (2021-12-23T02:51:35Z) - On Efficient Transformer and Image Pre-training for Low-level Vision [74.22436001426517]
Pre-training has marked numerous state of the arts in high-level computer vision.
We present an in-depth study of image pre-training.
We find pre-training plays strikingly different roles in low-level tasks.
arXiv Detail & Related papers (2021-12-19T15:50:48Z) - Robust Deep Reinforcement Learning via Multi-View Information Bottleneck [7.188571996124112]
We introduce an auxiliary objective based on the multi-view information bottleneck (MIB) principle.
This encourages learning representations that are both predictive of the future and less sensitive to task-irrelevant distractions.
We demonstrate that our approach can achieve SOTA performance on challenging visual control tasks, even when the background is replaced with natural videos.
arXiv Detail & Related papers (2021-02-26T02:24:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.