Related papers: Cyclophobic Reinforcement Learning

Cyclophobic Reinforcement Learning

URL: http://arxiv.org/abs/2308.15911v1
Date: Wed, 30 Aug 2023 09:38:44 GMT
Title: Cyclophobic Reinforcement Learning
Authors: Stefan Sylvius Wagner, Peter Arndt, Jan Robine, Stefan Harmeling
Abstract summary: In environments with sparse rewards, finding a good inductive bias for exploration is crucial to the agent's success. We propose a new intrinsic reward that is cyclophobic, i.e., it does not reward novelty, but punishes redundancy by avoiding cycles. Augmenting the cyclophobic intrinsic reward with a sequence of hierarchical representations we are able to achieve excellent results in the MiniGrid and MiniHack environments.
Score: 2.2940141855172036
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In environments with sparse rewards, finding a good inductive bias for exploration is crucial to the agent's success. However, there are two competing goals: novelty search and systematic exploration. While existing approaches such as curiosity-driven exploration find novelty, they sometimes do not systematically explore the whole state space, akin to depth-first-search vs breadth-first-search. In this paper, we propose a new intrinsic reward that is cyclophobic, i.e., it does not reward novelty, but punishes redundancy by avoiding cycles. Augmenting the cyclophobic intrinsic reward with a sequence of hierarchical representations based on the agent's cropped observations we are able to achieve excellent results in the MiniGrid and MiniHack environments. Both are particularly hard, as they require complex interactions with different objects in order to be solved. Detailed comparisons with previous approaches and thorough ablation studies show that our newly proposed cyclophobic reinforcement learning is more sample efficient than other state of the art methods in a variety of tasks.

Related papers

Autonomous state-space segmentation for Deep-RL sparse reward scenarios [0.30693357740321775]
Intrinsic Motivations could be an effective way to help Deep Reinforcement Learning algorithms learn. We propose a two-level architecture that alternates an ''intrinsically driven'' phase of exploration and autonomous sub-goal generation.
arXiv Detail & Related papers (2025-04-04T13:06:23Z)
Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation [69.1524391595912]
Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian noise fail to explore efficiently in some reinforcement learning tasks. This paper presents a theoretical analysis of such policies and provides the first regret and sample-complexity bounds for reinforcement learning with myopic exploration.
arXiv Detail & Related papers (2022-06-19T14:44:40Z)
Exploration in Deep Reinforcement Learning: A Survey [4.066140143829243]
Exploration techniques are of primary importance when solving sparse reward problems. In sparse reward problems, the reward is rare, which means that the agent will not find the reward often by acting randomly. This review provides a comprehensive overview of existing exploration approaches.
arXiv Detail & Related papers (2022-05-02T12:03:44Z)
Is Curiosity All You Need? On the Utility of Emergent Behaviours from Curious Exploration [20.38772636693469]
We argue that merely using curiosity for fast environment exploration or as a bonus reward for a specific task does not harness the full potential of this technique. We propose to shift the focus towards retaining the behaviours which emerge during curiosity-based learning.
arXiv Detail & Related papers (2021-09-17T15:28:25Z)
Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments [66.80667987347151]
Methods based on intrinsic rewards often fall short in procedurally-generated environments. We introduce RAPID, a simple yet effective episode-level exploration method for procedurally-generated environments. We demonstrate our method on several procedurally-generated MiniGrid environments, a first-person-view 3D Maze navigation task from MiniWorld, and several sparse MuJoCo tasks.
arXiv Detail & Related papers (2021-01-20T14:22:01Z)
BeBold: Exploration Beyond the Boundary of Explored Regions [66.88415950549556]
In this paper, we propose the regulated difference of inverse visitation counts as a simple but effective criterion for intrinsic reward (IR) The criterion helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment. The resulting method, BeBold, solves the 12 most challenging procedurally-generated tasks in MiniGrid with just 120M environment steps, without any curriculum learning.
arXiv Detail & Related papers (2020-12-15T21:26:54Z)
Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning [60.1292055717823]
We propose HyperX, which uses novel reward bonuses for meta-training to explore in approximate hyper-state space. We show empirically that HyperX meta-learns better task-exploration and adapts more successfully to new tasks than existing methods.
arXiv Detail & Related papers (2020-10-02T15:43:31Z)
Fast active learning for pure exploration in reinforcement learning [48.98199700043158]
We show that bonuses that scale with $1/n$ bring faster learning rates, improving the known upper bounds with respect to the dependence on the horizon. We also show that with an improved analysis of the stopping time, we can improve by a factor $H$ the sample complexity in the best-policy identification setting.
arXiv Detail & Related papers (2020-07-27T11:28:32Z)
Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent. We present a new approach to self-supervised exploration and fast adaptation to new tasks. Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z)
Curious Hierarchical Actor-Critic Reinforcement Learning [13.225264876433528]
Hierarchical abstraction and curiosity-driven exploration are two common paradigms in current reinforcement learning approaches. We develop a method that combines hierarchical reinforcement learning with curiosity. We demonstrate in several continuous-space environments that curiosity can more than double the learning performance and success rates.
arXiv Detail & Related papers (2020-05-07T12:44:26Z)
RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments [15.736899098702972]
We propose a novel type of intrinsic reward which encourages the agent to take actions that lead to significant changes in its learned state representation. We evaluate our method on multiple challenging procedurally-generated tasks in MiniGrid.
arXiv Detail & Related papers (2020-02-27T18:03:16Z)
Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement Learning [34.38011902445557]
Reinforcement learning with sparse rewards is still an open challenge. We present a novel approach that plans exploration actions far into the future by using a long-term visitation count. Contrary to existing methods which use models of reward and dynamics, our approach is off-policy and model-free.
arXiv Detail & Related papers (2020-01-01T01:01:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.