Efficient Reinforcement Learning Through Adaptively Pretrained Visual Encoder
- URL: http://arxiv.org/abs/2502.05555v1
- Date: Sat, 08 Feb 2025 12:57:02 GMT
- Title: Efficient Reinforcement Learning Through Adaptively Pretrained Visual Encoder
- Authors: Yuhan Zhang, Guoqing Ma, Guangfu Hao, Liangxuan Guo, Yang Chen, Shan Yu,
- Abstract summary: We propose APE: efficient reinforcement learning through Adaptively Pretrained visual.
APE uses adaptive augmentation strategy during the pretraining phase and extracts generalizable features with only a few interactions within the task environments in the policy learning period.
Results show that mainstream RL methods, such as DreamerV3 and DrQ-v2, achieve state-of-the-art performance when equipped with APE.
- Score: 12.310140622800372
- License:
- Abstract: While Reinforcement Learning (RL) agents can successfully learn to handle complex tasks, effectively generalizing acquired skills to unfamiliar settings remains a challenge. One of the reasons behind this is the visual encoders used are task-dependent, preventing effective feature extraction in different settings. To address this issue, recent studies have tried to pretrain encoders with diverse visual inputs in order to improve their performance. However, they rely on existing pretrained encoders without further exploring the impact of pretraining period. In this work, we propose APE: efficient reinforcement learning through Adaptively Pretrained visual Encoder -- a framework that utilizes adaptive augmentation strategy during the pretraining phase and extracts generalizable features with only a few interactions within the task environments in the policy learning period. Experiments are conducted across various domains, including DeepMind Control Suite, Atari Games and Memory Maze benchmarks, to verify the effectiveness of our method. Results show that mainstream RL methods, such as DreamerV3 and DrQ-v2, achieve state-of-the-art performance when equipped with APE. In addition, APE significantly improves the sampling efficiency using only visual inputs during learning, approaching the efficiency of state-based method in several control tasks. These findings demonstrate the potential of adaptive pretraining of encoder in enhancing the generalization ability and efficiency of visual RL algorithms.
Related papers
- Continual Learning in the Frequency Domain [22.415936450558334]
We propose a novel framework called Continual Learning in the Frequency Domain (CLFD)
For the input features of the feature extractor, CLFD employs wavelet transform to map the original input image into the frequency domain.
Experiments conducted in both cloud and edge environments demonstrate that CLFD consistently improves the performance of state-of-the-art (SOTA) methods in both precision and training efficiency.
arXiv Detail & Related papers (2024-10-09T07:57:47Z) - Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition [22.615830919860777]
This paper presents an efficient visual recognition paradigm, called Dynamic Adapter (Dyn-Adapter)
We devise a dynamic architecture with balanced early heads for multi-level feature extraction, along with adaptive training strategy.
We reduce FLOPs during inference by 50%, while maintaining or even yielding higher recognition accuracy.
arXiv Detail & Related papers (2024-07-19T13:33:38Z) - Contrastive-Adversarial and Diffusion: Exploring pre-training and fine-tuning strategies for sulcal identification [3.0398616939692777]
Techniques like adversarial learning, contrastive learning, diffusion denoising learning, and ordinary reconstruction learning have become standard.
The study aims to elucidate the advantages of pre-training techniques and fine-tuning strategies to enhance the learning process of neural networks.
arXiv Detail & Related papers (2024-05-29T15:44:51Z) - Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss [61.355272240758]
Premier-TACO is a multitask feature representation learning approach.
It is designed to improve few-shot policy learning efficiency in sequential decision-making tasks.
arXiv Detail & Related papers (2024-02-09T05:04:40Z) - Value Explicit Pretraining for Learning Transferable Representations [11.069853883599102]
We propose a method that learns generalizable representations for transfer reinforcement learning.
We learn new tasks that share similar objectives as previously learned tasks, by learning an encoder for objective-conditioned representations.
Experiments using a realistic navigation simulator and Atari benchmark show that the pretrained encoder produced by our method outperforms current SoTA pretraining methods.
arXiv Detail & Related papers (2023-12-19T17:12:35Z) - PerceptionGPT: Effectively Fusing Visual Perception into LLM [31.34127196055722]
The integration of visual inputs with large language models (LLMs) has led to remarkable advancements in multi-modal capabilities, giving rise to visual large language models (VLLMs)
We present a novel end-to-end framework named PerceptionGPT, which efficiently equips the VLLMs with visual perception abilities.
Our approach significantly alleviates the training difficulty suffered by previous approaches that formulate the visual outputs as discrete tokens.
arXiv Detail & Related papers (2023-11-11T16:59:20Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Reinforcement Learning with Action-Free Pre-Training from Videos [95.25074614579646]
We introduce a framework that learns representations useful for understanding the dynamics via generative pre-training on videos.
Our framework significantly improves both final performances and sample-efficiency of vision-based reinforcement learning.
arXiv Detail & Related papers (2022-03-25T19:44:09Z) - On Efficient Transformer and Image Pre-training for Low-level Vision [74.22436001426517]
Pre-training has marked numerous state of the arts in high-level computer vision.
We present an in-depth study of image pre-training.
We find pre-training plays strikingly different roles in low-level tasks.
arXiv Detail & Related papers (2021-12-19T15:50:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.