Focus On What Matters: Separated Models For Visual-Based RL Generalization
- URL: http://arxiv.org/abs/2410.10834v1
- Date: Sun, 29 Sep 2024 04:37:56 GMT
- Title: Focus On What Matters: Separated Models For Visual-Based RL Generalization
- Authors: Di Zhang, Bowen Lv, Hai Zhang, Feifan Yang, Junqiao Zhao, Hang Yu, Chang Huang, Hongtu Zhou, Chen Ye, Changjun Jiang,
- Abstract summary: Separated Models for Generalization (SMG) is a novel approach that exploits image reconstruction for generalization.
SMG incorporates two additional consistency losses to guide the agent's focus toward task-relevant areas across different scenarios.
Experiments in DMC demonstrate the SOTA performance of SMG in generalization, particularly excelling in video-background settings.
- Score: 16.87505461758058
- License:
- Abstract: A primary challenge for visual-based Reinforcement Learning (RL) is to generalize effectively across unseen environments. Although previous studies have explored different auxiliary tasks to enhance generalization, few adopt image reconstruction due to concerns about exacerbating overfitting to task-irrelevant features during training. Perceiving the pre-eminence of image reconstruction in representation learning, we propose SMG (Separated Models for Generalization), a novel approach that exploits image reconstruction for generalization. SMG introduces two model branches to extract task-relevant and task-irrelevant representations separately from visual observations via cooperatively reconstruction. Built upon this architecture, we further emphasize the importance of task-relevant features for generalization. Specifically, SMG incorporates two additional consistency losses to guide the agent's focus toward task-relevant areas across different scenarios, thereby achieving free from overfitting. Extensive experiments in DMC demonstrate the SOTA performance of SMG in generalization, particularly excelling in video-background settings. Evaluations on robotic manipulation tasks further confirm the robustness of SMG in real-world applications.
Related papers
- AEMIM: Adversarial Examples Meet Masked Image Modeling [12.072673694665934]
We propose to incorporate adversarial examples into masked image modeling, as the new reconstruction targets.
In particular, we introduce a novel auxiliary pretext task that reconstructs the adversarial examples corresponding to the original images.
We also devise an innovative adversarial attack to craft more suitable adversarial examples for MIM pre-training.
arXiv Detail & Related papers (2024-07-16T09:39:13Z) - RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model [22.56227565913003]
We propose a comprehensive remote sensing image building model, termed RSBuilding, developed from the perspective of the foundation model.
RSBuilding is designed to enhance cross-scene generalization and task understanding.
Our model was trained on a dataset comprising up to 245,000 images and validated on multiple building extraction and change detection datasets.
arXiv Detail & Related papers (2024-03-12T11:51:59Z) - Boosting Image Restoration via Priors from Pre-trained Models [54.83907596825985]
We learn an additional lightweight module called Pre-Train-Guided Refinement Module (PTG-RM) to refine restoration results of a target restoration network with OSF.
PTG-RM effectively enhances restoration performance of various models across different tasks, including low-light enhancement, deraining, deblurring, and denoising.
arXiv Detail & Related papers (2024-03-11T15:11:57Z) - Attention-Guided Masked Autoencoders For Learning Image Representations [16.257915216763692]
Masked autoencoders (MAEs) have established themselves as a powerful method for unsupervised pre-training for computer vision tasks.
We propose to inform the reconstruction process through an attention-guided loss function.
Our evaluations show that our pre-trained models learn better latent representations than the vanilla MAE.
arXiv Detail & Related papers (2024-02-23T08:11:25Z) - Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration [58.11518043688793]
MPerceiver is a novel approach to enhance adaptiveness, generalizability and fidelity for all-in-one image restoration.
MPerceiver is trained on 9 tasks for all-in-one IR and outperforms state-of-the-art task-specific methods across most tasks.
arXiv Detail & Related papers (2023-12-05T17:47:11Z) - Unifying Image Processing as Visual Prompting Question Answering [62.84955983910612]
Image processing is a fundamental task in computer vision, which aims at enhancing image quality and extracting essential features for subsequent vision applications.
Traditionally, task-specific models are developed for individual tasks and designing such models requires distinct expertise.
We propose a universal model for general image processing that covers image restoration, image enhancement, image feature extraction tasks.
arXiv Detail & Related papers (2023-10-16T15:32:57Z) - Top-Down Visual Attention from Analysis by Synthesis [87.47527557366593]
We consider top-down attention from a classic Analysis-by-Synthesis (AbS) perspective of vision.
We propose Analysis-by-Synthesis Vision Transformer (AbSViT), which is a top-down modulated ViT model that variationally approximates AbS, and controllable achieves top-down attention.
arXiv Detail & Related papers (2023-03-23T05:17:05Z) - Pre-Trained Image Encoder for Generalizable Visual Reinforcement
Learning [27.304282924423095]
We propose Pre-trained Image for Generalizable visual reinforcement learning (PIE-G)
PIE-G is a simple yet effective framework that can generalize to the unseen visual scenarios in a zero-shot manner.
Empirical evidence suggests PIE-G improves sample efficiency and significantly outperforms previous state-of-the-art methods in terms of generalization performance.
arXiv Detail & Related papers (2022-12-17T12:45:08Z) - Learning Task-relevant Representations for Generalization via
Characteristic Functions of Reward Sequence Distributions [63.773813221460614]
Generalization across different environments with the same tasks is critical for successful applications of visual reinforcement learning.
We propose a novel approach, namely Characteristic Reward Sequence Prediction (CRESP), to extract the task-relevant information.
Experiments demonstrate that CRESP significantly improves the performance of generalization on unseen environments.
arXiv Detail & Related papers (2022-05-20T14:52:03Z) - HiFaceGAN: Face Renovation via Collaborative Suppression and
Replenishment [63.333407973913374]
"Face Renovation"(FR) is a semantic-guided generation problem.
"HiFaceGAN" is a multi-stage framework containing several nested CSR units.
experiments on both synthetic and real face images have verified the superior performance of HiFaceGAN.
arXiv Detail & Related papers (2020-05-11T11:33:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.