Making Curiosity Explicit in Vision-based RL
- URL: http://arxiv.org/abs/2109.13588v1
- Date: Tue, 28 Sep 2021 09:50:37 GMT
- Title: Making Curiosity Explicit in Vision-based RL
- Authors: Elie Aljalbout and Maximilian Ulmer and Rudolph Triebel
- Abstract summary: Vision-based reinforcement learning (RL) is a promising technique to solve control tasks involving images as the main observation.
State-of-the-art RL algorithms still struggle in terms of sample efficiency.
We present an approach to improve the sample diversity.
- Score: 12.829056201510994
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision-based reinforcement learning (RL) is a promising technique to solve
control tasks involving images as the main observation. State-of-the-art RL
algorithms still struggle in terms of sample efficiency, especially when using
image observations. This has led to an increased attention on integrating state
representation learning (SRL) techniques into the RL pipeline. Work in this
field demonstrates a substantial improvement in sample efficiency among other
benefits. However, to take full advantage of this paradigm, the quality of
samples used for training plays a crucial role. More importantly, the diversity
of these samples could affect the sample efficiency of vision-based RL, but
also its generalization capability. In this work, we present an approach to
improve the sample diversity. Our method enhances the exploration capability of
the RL algorithms by taking advantage of the SRL setup. Our experiments show
that the presented approach outperforms the baseline for all tested
environments. These results are most apparent for environments where the
baseline method struggles. Even in simple environments, our method stabilizes
the training, reduces the reward variance and boosts sample efficiency.
Related papers
- Enhancing Diffusion-based Restoration Models via Difficulty-Adaptive Reinforcement Learning with IQA Reward [93.04811239892852]
Reinforcement Learning (RL) has recently been incorporated into diffusion models.<n>In this paper, we investigate how to effectively integrate RL into diffusion-based restoration models.
arXiv Detail & Related papers (2025-11-03T14:57:57Z) - Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models [47.05227816684691]
We introduce a novel PSRL framework (AttnRL) which enables efficient exploration for reasoning models.<n>Motivated by preliminary observations that steps exhibiting high attention scores correlate with reasoning behaviors, we propose to branch from positions with high values.<n>We develop an adaptive sampling strategy that accounts for problem difficulty and historical batch size, ensuring that the whole training batch maintains non-zero advantage values.
arXiv Detail & Related papers (2025-09-30T17:58:34Z) - DeGuV: Depth-Guided Visual Reinforcement Learning for Generalization and Interpretability in Manipulation [3.694734526301468]
This paper introduces DeGuV, an RL framework that enhances both generalization and sample efficiency.<n>We leverage a learnable masker network that produces a mask from the depth input, preserving only critical visual information while discarding irrelevant pixels.<n>In addition, we incorporate contrastive learning and stabilize Q-value estimation under augmentation to further enhance sample efficiency and training stability.
arXiv Detail & Related papers (2025-09-05T09:52:08Z) - Angles Don't Lie: Unlocking Training-Efficient RL Through the Model's Own Signals [49.17123504516502]
CurrentReinforcement Fine-tuning (RFT) paradigms for Large Language Models (LLMs) suffer from inefficiency due to redundant exposure of identical queries under uniform data sampling.<n>We propose a Gradient-driven Angle-Informed Navigated RL framework.<n>By leveraging the model's intrinsic angle concentration signal, GAIN-RL dynamically selects training data in each epoch, ensuring consistently impactful gradient updates.
arXiv Detail & Related papers (2025-06-02T21:40:38Z) - Sample Efficient Myopic Exploration Through Multitask Reinforcement
Learning with Diverse Tasks [53.44714413181162]
This paper shows that when an agent is trained on a sufficiently diverse set of tasks, a generic policy-sharing algorithm with myopic exploration design can be sample-efficient.
To the best of our knowledge, this is the first theoretical demonstration of the "exploration benefits" of MTRL.
arXiv Detail & Related papers (2024-03-03T22:57:44Z) - Learning Better with Less: Effective Augmentation for Sample-Efficient
Visual Reinforcement Learning [57.83232242068982]
Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms.
It remains unclear which attributes of DA account for its effectiveness in achieving sample-efficient visual RL.
This work conducts comprehensive experiments to assess the impact of DA's attributes on its efficacy.
arXiv Detail & Related papers (2023-05-25T15:46:20Z) - A Comprehensive Survey of Data Augmentation in Visual Reinforcement Learning [53.35317176453194]
Data augmentation (DA) has become a widely used technique in visual RL for acquiring sample-efficient and generalizable policies.
We present a principled taxonomy of the existing augmentation techniques used in visual RL and conduct an in-depth discussion on how to better leverage augmented data.
As the first comprehensive survey of DA in visual RL, this work is expected to offer valuable guidance to this emerging field.
arXiv Detail & Related papers (2022-10-10T11:01:57Z) - Light-weight probing of unsupervised representations for Reinforcement Learning [20.638410483549706]
We study whether linear probing can be a proxy evaluation task for the quality of unsupervised RL representation.
We show that the probing tasks are strongly rank correlated with the downstream RL performance on the Atari100k Benchmark.
This provides a more efficient method for exploring the space of pretraining algorithms and identifying promising pretraining recipes.
arXiv Detail & Related papers (2022-08-25T21:08:01Z) - Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning [92.18524491615548]
Contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL)
We study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions.
Under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.
arXiv Detail & Related papers (2022-07-29T17:29:08Z) - CCLF: A Contrastive-Curiosity-Driven Learning Framework for
Sample-Efficient Reinforcement Learning [56.20123080771364]
We develop a model-agnostic Contrastive-Curiosity-Driven Learning Framework (CCLF) for reinforcement learning.
CCLF fully exploit sample importance and improve learning efficiency in a self-supervised manner.
We evaluate this approach on the DeepMind Control Suite, Atari, and MiniGrid benchmarks.
arXiv Detail & Related papers (2022-05-02T14:42:05Z) - Mask-based Latent Reconstruction for Reinforcement Learning [58.43247393611453]
Mask-based Latent Reconstruction (MLR) is proposed to predict the complete state representations in the latent space from the observations with spatially and temporally masked pixels.
Extensive experiments show that our MLR significantly improves the sample efficiency in deep reinforcement learning.
arXiv Detail & Related papers (2022-01-28T13:07:11Z) - Seeking Visual Discomfort: Curiosity-driven Representations for
Reinforcement Learning [12.829056201510994]
We present an approach to improve sample diversity for state representation learning.
Our proposed approach boosts the visitation of problematic states, improves the learned state representation, and outperforms the baselines for all tested environments.
arXiv Detail & Related papers (2021-10-02T11:15:04Z) - Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under
Data Augmentation [25.493902939111265]
We investigate causes of instability when using data augmentation in off-policy Reinforcement Learning algorithms.
We propose a simple yet effective technique for stabilizing this class of algorithms under augmentation.
Our method greatly improves stability and sample efficiency of ConvNets under augmentation, and achieves generalization results competitive with state-of-the-art methods for image-based RL.
arXiv Detail & Related papers (2021-07-01T17:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.