Related papers: The impact of intrinsic rewards on exploration in Reinforcement Learning

The impact of intrinsic rewards on exploration in Reinforcement Learning

URL: http://arxiv.org/abs/2501.11533v1
Date: Mon, 20 Jan 2025 15:17:24 GMT
Title: The impact of intrinsic rewards on exploration in Reinforcement Learning
Authors: Aya Kayal, Eduardo Pignatelli, Laura Toni,
Abstract summary: State Count leads to the best exploration performance in the case of low-dimensional observations.<n> Maximum Entropy is less impacted, resulting in a more robust exploration.<n>Learning diverse skills with DIAYN, often linked to improved robustness and generalisation, does not promote exploration in MiniGrid environments.
Score: 3.6253617038977226
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One of the open challenges in Reinforcement Learning is the hard exploration problem in sparse reward environments. Various types of intrinsic rewards have been proposed to address this challenge by pushing towards diversity. This diversity might be imposed at different levels, favouring the agent to explore different states, policies or behaviours (State, Policy and Skill level diversity, respectively). However, the impact of diversity on the agent's behaviour remains unclear. In this work, we aim to fill this gap by studying the effect of different levels of diversity imposed by intrinsic rewards on the exploration patterns of RL agents. We select four intrinsic rewards (State Count, Intrinsic Curiosity Module (ICM), Maximum Entropy, and Diversity is all you need (DIAYN)), each pushing for a different diversity level. We conduct an empirical study on MiniGrid environment to compare their impact on exploration considering various metrics related to the agent's exploration, namely: episodic return, observation coverage, agent's position coverage, policy entropy, and timeframes to reach the sparse reward. The main outcome of the study is that State Count leads to the best exploration performance in the case of low-dimensional observations. However, in the case of RGB observations, the performance of State Count is highly degraded mostly due to representation learning challenges. Conversely, Maximum Entropy is less impacted, resulting in a more robust exploration, despite being not always optimal. Lastly, our empirical study revealed that learning diverse skills with DIAYN, often linked to improved robustness and generalisation, does not promote exploration in MiniGrid environments. This is because: i) learning the skill space itself can be challenging, and ii) exploration within the skill space prioritises differentiating between behaviours rather than achieving uniform state visitation.

Related papers

Diversity-Incentivized Exploration for Versatile Reasoning [63.653348177250756]
We propose textbfDIVER (textbfDi-textbfIncentivized Exploration for textbfVersatiltextbfE textbfReasoning), an innovative framework that highlights the pivotal role of global sequence-level diversity to incentivize deep exploration for versatile reasoning.
arXiv Detail & Related papers (2025-09-30T13:11:46Z)
Unsupervised Skill Discovery through Skill Regions Differentiation [6.088346462603191]
Unsupervised Reinforcement Learning (RL) aims to discover diverse behaviors that can accelerate the learning of downstream tasks.<n>We propose a novel skill discovery objective that maximizes the deviation of the state density of one skill from the explored regions of other skills.<n>We also formulate an intrinsic reward based on the learned autoencoder that resembles count-based exploration in a compact latent space.
arXiv Detail & Related papers (2025-06-17T11:30:04Z)
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards.<n>We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration.<n>We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z)
Variable-Agnostic Causal Exploration for Reinforcement Learning [56.52768265734155]
We introduce a novel framework, Variable-Agnostic Causal Exploration for Reinforcement Learning (VACERL) Our approach automatically identifies crucial observation-action steps associated with key variables using attention mechanisms. It constructs the causal graph connecting these steps, which guides the agent towards observation-action pairs with greater causal influence on task completion.
arXiv Detail & Related papers (2024-07-17T09:45:27Z)
Curiosity & Entropy Driven Unsupervised RL in Multiple Environments [0.0]
We propose and experiment with five new modifications to the original work. In high-dimensional environments, curiosity-driven exploration enhances learning by encouraging the agent to seek diverse experiences and explore the unknown more. However, its benefits are limited in low-dimensional and simpler environments where exploration possibilities are constrained and there is little that is truly unknown to the agent.
arXiv Detail & Related papers (2024-01-08T19:25:40Z)
Never Explore Repeatedly in Multi-Agent Reinforcement Learning [40.35950679063337]
We propose a dynamic reward scaling approach to combat "revisitation" We show enhanced performance in demanding environments like Google Research Football and StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2023-08-19T05:27:48Z)
DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards [2.09711130126031]
Exploration is a fundamental aspect of reinforcement learning (RL), and its effectiveness is a deciding factor in the performance of RL algorithms. Recent studies have shown the effectiveness of encouraging exploration with intrinsic rewards estimated from novelties in observations. We propose DEIR, a novel method in which we theoretically derive an intrinsic reward with a conditional mutual information term.
arXiv Detail & Related papers (2023-04-21T06:39:38Z)
Collaborative Training of Heterogeneous Reinforcement Learning Agents in Environments with Sparse Rewards: What and When to Share? [7.489793155793319]
This work focuses on combining information obtained through intrinsic motivation with the aim of having a more efficient exploration and faster learning. Our results reveal different ways in which a collaborative framework with little additional computational cost can outperform an independent learning process without knowledge sharing.
arXiv Detail & Related papers (2022-02-24T16:15:51Z)
Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards. We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences. We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z)
Seeing Differently, Acting Similarly: Imitation Learning with Heterogeneous Observations [126.78199124026398]
In many real-world imitation learning tasks, the demonstrator and the learner have to act in different but full observation spaces. In this work, we model the above learning problem as Heterogeneous Observations Learning (HOIL) We propose the Importance Weighting with REjection (IWRE) algorithm based on the techniques of importance-weighting, learning with rejection, and active querying to solve the key challenge of occupancy measure matching.
arXiv Detail & Related papers (2021-06-17T05:44:04Z)
Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning [12.76337275628074]
In this work, we propose a variational dynamic model based on the conditional variational inference to model the multimodality andgenerativeity. We derive an upper bound of the negative log-likelihood of the environmental transition and use such an upper bound as the intrinsic reward for exploration. Our method outperforms several state-of-the-art environment model-based exploration approaches.
arXiv Detail & Related papers (2020-10-17T09:54:51Z)
Ecological Reinforcement Learning [76.9893572776141]
We study the kinds of environment properties that can make learning under such conditions easier. understanding how properties of the environment impact the performance of reinforcement learning agents can help us to structure our tasks in ways that make learning tractable.
arXiv Detail & Related papers (2020-06-22T17:55:03Z)
Soft Hindsight Experience Replay [77.99182201815763]
Soft Hindsight Experience Replay (SHER) is a novel approach based on HER and Maximum Entropy Reinforcement Learning (MERL) We evaluate SHER on Open AI Robotic manipulation tasks with sparse rewards.
arXiv Detail & Related papers (2020-02-06T03:57:04Z)
Effective Diversity in Population Based Reinforcement Learning [38.62641968788987]
We introduce an approach to optimize all members of a population simultaneously. Rather than using pairwise distance, we measure the volume of the entire population in a behavioral manifold. Our algorithm Diversity via Determinants (DvD) adapts the degree of diversity during training using online learning techniques.
arXiv Detail & Related papers (2020-02-03T10:09:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.