Cyclophobic Reinforcement Learning
- URL: http://arxiv.org/abs/2308.15911v1
- Date: Wed, 30 Aug 2023 09:38:44 GMT
- Title: Cyclophobic Reinforcement Learning
- Authors: Stefan Sylvius Wagner, Peter Arndt, Jan Robine, Stefan Harmeling
- Abstract summary: In environments with sparse rewards, finding a good inductive bias for exploration is crucial to the agent's success.
We propose a new intrinsic reward that is cyclophobic, i.e., it does not reward novelty, but punishes redundancy by avoiding cycles.
Augmenting the cyclophobic intrinsic reward with a sequence of hierarchical representations we are able to achieve excellent results in the MiniGrid and MiniHack environments.
- Score: 2.2940141855172036
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In environments with sparse rewards, finding a good inductive bias for
exploration is crucial to the agent's success. However, there are two competing
goals: novelty search and systematic exploration. While existing approaches
such as curiosity-driven exploration find novelty, they sometimes do not
systematically explore the whole state space, akin to depth-first-search vs
breadth-first-search. In this paper, we propose a new intrinsic reward that is
cyclophobic, i.e., it does not reward novelty, but punishes redundancy by
avoiding cycles. Augmenting the cyclophobic intrinsic reward with a sequence of
hierarchical representations based on the agent's cropped observations we are
able to achieve excellent results in the MiniGrid and MiniHack environments.
Both are particularly hard, as they require complex interactions with different
objects in order to be solved. Detailed comparisons with previous approaches
and thorough ablation studies show that our newly proposed cyclophobic
reinforcement learning is more sample efficient than other state of the art
methods in a variety of tasks.
Related papers
- Guarantees for Epsilon-Greedy Reinforcement Learning with Function
Approximation [69.1524391595912]
Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian noise fail to explore efficiently in some reinforcement learning tasks.
This paper presents a theoretical analysis of such policies and provides the first regret and sample-complexity bounds for reinforcement learning with myopic exploration.
arXiv Detail & Related papers (2022-06-19T14:44:40Z) - Exploration in Deep Reinforcement Learning: A Survey [4.066140143829243]
Exploration techniques are of primary importance when solving sparse reward problems.
In sparse reward problems, the reward is rare, which means that the agent will not find the reward often by acting randomly.
This review provides a comprehensive overview of existing exploration approaches.
arXiv Detail & Related papers (2022-05-02T12:03:44Z) - Is Curiosity All You Need? On the Utility of Emergent Behaviours from
Curious Exploration [20.38772636693469]
We argue that merely using curiosity for fast environment exploration or as a bonus reward for a specific task does not harness the full potential of this technique.
We propose to shift the focus towards retaining the behaviours which emerge during curiosity-based learning.
arXiv Detail & Related papers (2021-09-17T15:28:25Z) - Rank the Episodes: A Simple Approach for Exploration in
Procedurally-Generated Environments [66.80667987347151]
Methods based on intrinsic rewards often fall short in procedurally-generated environments.
We introduce RAPID, a simple yet effective episode-level exploration method for procedurally-generated environments.
We demonstrate our method on several procedurally-generated MiniGrid environments, a first-person-view 3D Maze navigation task from MiniWorld, and several sparse MuJoCo tasks.
arXiv Detail & Related papers (2021-01-20T14:22:01Z) - BeBold: Exploration Beyond the Boundary of Explored Regions [66.88415950549556]
In this paper, we propose the regulated difference of inverse visitation counts as a simple but effective criterion for intrinsic reward (IR)
The criterion helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment.
The resulting method, BeBold, solves the 12 most challenging procedurally-generated tasks in MiniGrid with just 120M environment steps, without any curriculum learning.
arXiv Detail & Related papers (2020-12-15T21:26:54Z) - Exploration in Approximate Hyper-State Space for Meta Reinforcement
Learning [60.1292055717823]
We propose HyperX, which uses novel reward bonuses for meta-training to explore in approximate hyper-state space.
We show empirically that HyperX meta-learns better task-exploration and adapts more successfully to new tasks than existing methods.
arXiv Detail & Related papers (2020-10-02T15:43:31Z) - Fast active learning for pure exploration in reinforcement learning [48.98199700043158]
We show that bonuses that scale with $1/n$ bring faster learning rates, improving the known upper bounds with respect to the dependence on the horizon.
We also show that with an improved analysis of the stopping time, we can improve by a factor $H$ the sample complexity in the best-policy identification setting.
arXiv Detail & Related papers (2020-07-27T11:28:32Z) - Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent.
We present a new approach to self-supervised exploration and fast adaptation to new tasks.
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z) - Curious Hierarchical Actor-Critic Reinforcement Learning [13.225264876433528]
Hierarchical abstraction and curiosity-driven exploration are two common paradigms in current reinforcement learning approaches.
We develop a method that combines hierarchical reinforcement learning with curiosity.
We demonstrate in several continuous-space environments that curiosity can more than double the learning performance and success rates.
arXiv Detail & Related papers (2020-05-07T12:44:26Z) - RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated
Environments [15.736899098702972]
We propose a novel type of intrinsic reward which encourages the agent to take actions that lead to significant changes in its learned state representation.
We evaluate our method on multiple challenging procedurally-generated tasks in MiniGrid.
arXiv Detail & Related papers (2020-02-27T18:03:16Z) - Long-Term Visitation Value for Deep Exploration in Sparse Reward
Reinforcement Learning [34.38011902445557]
Reinforcement learning with sparse rewards is still an open challenge.
We present a novel approach that plans exploration actions far into the future by using a long-term visitation count.
Contrary to existing methods which use models of reward and dynamics, our approach is off-policy and model-free.
arXiv Detail & Related papers (2020-01-01T01:01:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.