Follow your Nose: Using General Value Functions for Directed Exploration
in Reinforcement Learning
- URL: http://arxiv.org/abs/2203.00874v1
- Date: Wed, 2 Mar 2022 05:14:11 GMT
- Title: Follow your Nose: Using General Value Functions for Directed Exploration
in Reinforcement Learning
- Authors: Somjit Nath, Omkar Shelke, Durgesh Kalwar, Hardik Meisheri, Harshad
Khadilkar
- Abstract summary: This paper explores the idea of combining exploration with auxiliary task learning using General Value Functions (GVFs) and a directed exploration strategy.
We provide a simple way to learn options (sequences of actions) instead of having to handcraft them, and demonstrate the performance advantage in three navigation tasks.
- Score: 5.40729975786985
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Exploration versus exploitation dilemma is a significant problem in
reinforcement learning (RL), particularly in complex environments with large
state space and sparse rewards. When optimizing for a particular goal, running
simple smaller tasks can often be a good way to learn additional information
about the environment. Exploration methods have been used to sample better
trajectories from the environment for improved performance while auxiliary
tasks have been incorporated generally where the reward is sparse. If there is
little reward signal available, the agent requires clever exploration
strategies to reach parts of the state space that contain relevant sub-goals.
However, that exploration needs to be balanced with the need for exploiting the
learned policy. This paper explores the idea of combining exploration with
auxiliary task learning using General Value Functions (GVFs) and a directed
exploration strategy. We provide a simple way to learn options (sequences of
actions) instead of having to handcraft them, and demonstrate the performance
advantage in three navigation tasks.
Related papers
- Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE)
RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies.
We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z) - Interesting Object, Curious Agent: Learning Task-Agnostic Exploration [44.18450799034677]
In this paper, we propose a paradigm change in the formulation and evaluation of task-agnostic exploration.
We show that our formulation is effective and provides the most consistent exploration across several training-testing environment pairs.
arXiv Detail & Related papers (2021-11-25T15:17:32Z) - Discovering and Exploiting Sparse Rewards in a Learned Behavior Space [0.46736439782713946]
Learning optimal policies in sparse rewards settings is difficult as the learning agent has little to no feedback on the quality of its actions.
We introduce STAX, an algorithm designed to learn a behavior space on-the-fly and to explore it while efficiently optimizing any reward discovered.
arXiv Detail & Related papers (2021-11-02T22:21:11Z) - Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore)
In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process.
We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z) - Deep Reinforcement Learning for Adaptive Exploration of Unknown
Environments [6.90777229452271]
We develop an adaptive exploration approach to trade off between exploration and exploitation in one single step for UAVs.
The proposed approach uses a map segmentation technique to decompose the environment map into smaller, tractable maps.
The results demonstrate that our proposed approach is capable of navigating through randomly generated environments and covering more AoI in less time steps compared to the baselines.
arXiv Detail & Related papers (2021-05-04T16:29:44Z) - Batch Exploration with Examples for Scalable Robotic Reinforcement
Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states.
BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - Decoupling Exploration and Exploitation for Meta-Reinforcement Learning
without Sacrifices [132.49849640628727]
meta-reinforcement learning (meta-RL) builds agents that can quickly learn new tasks by leveraging prior experience on related tasks.
In principle, optimal exploration and exploitation can be learned end-to-end by simply maximizing task performance.
We present DREAM, which avoids local optima in end-to-end training, without sacrificing optimal exploration.
arXiv Detail & Related papers (2020-08-06T17:57:36Z) - Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent.
We present a new approach to self-supervised exploration and fast adaptation to new tasks.
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z) - Long-Term Visitation Value for Deep Exploration in Sparse Reward
Reinforcement Learning [34.38011902445557]
Reinforcement learning with sparse rewards is still an open challenge.
We present a novel approach that plans exploration actions far into the future by using a long-term visitation count.
Contrary to existing methods which use models of reward and dynamics, our approach is off-policy and model-free.
arXiv Detail & Related papers (2020-01-01T01:01:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.