Neighboring state-based RL Exploration
- URL: http://arxiv.org/abs/2212.10712v1
- Date: Wed, 21 Dec 2022 01:23:53 GMT
- Title: Neighboring state-based RL Exploration
- Authors: Jeffery Cheng, Kevin Li, Justin Lin, Pedro Pachuca
- Abstract summary: We study neighboring state-based, model-free exploration led by the intuition that, for an early-stage agent, considering actions derived from a bounded region of nearby states may lead to better actions when exploring.
We propose two algorithms that choose exploratory actions based on a survey of nearby states, and find that one of our methods, $rho$-explore, consistently outperforms the Double DQN baseline in an discrete environment by 49% in terms of Eval Reward Return.
- Score: 1.5935205681539144
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement Learning is a powerful tool to model decision-making processes.
However, it relies on an exploration-exploitation trade-off that remains an
open challenge for many tasks. In this work, we study neighboring state-based,
model-free exploration led by the intuition that, for an early-stage agent,
considering actions derived from a bounded region of nearby states may lead to
better actions when exploring. We propose two algorithms that choose
exploratory actions based on a survey of nearby states, and find that one of
our methods, ${\rho}$-explore, consistently outperforms the Double DQN baseline
in an discrete environment by 49\% in terms of Eval Reward Return.
Related papers
- Deterministic Exploration via Stationary Bellman Error Maximization [6.474106100512158]
Exploration is a crucial and distinctive aspect of reinforcement learning (RL)
In this paper, we introduce three modifications to stabilize the latter and arrive at a deterministic exploration policy.
Our experimental results show that our approach can outperform $varepsilon$-greedy in dense and sparse reward settings.
arXiv Detail & Related papers (2024-10-31T11:46:48Z) - Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement
Learning [20.0888026410406]
We show that counts can be derived by averaging samples from the Rademacher distribution.
We show that our method is significantly more effective at deducing ground-truth visitation counts than previous work.
arXiv Detail & Related papers (2023-06-05T18:56:48Z) - Rewarding Episodic Visitation Discrepancy for Exploration in
Reinforcement Learning [64.8463574294237]
We propose Rewarding Episodic Visitation Discrepancy (REVD) as an efficient and quantified exploration method.
REVD provides intrinsic rewards by evaluating the R'enyi divergence-based visitation discrepancy between episodes.
It is tested on PyBullet Robotics Environments and Atari games.
arXiv Detail & Related papers (2022-09-19T08:42:46Z) - Generative Adversarial Exploration for Reinforcement Learning [48.379457575356454]
In this paper, we propose a novel method called generative adversarial exploration (GAEX) to encourage exploration in reinforcement learning (RL)
In our experiments, we apply GAEX to the game Venture, Montezuma's Revenge and Super Mario Bros.
To our knowledge, this is the first work to employ GAN in RL exploration problems.
arXiv Detail & Related papers (2022-01-27T17:34:47Z) - Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore)
In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process.
We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - Intrinsic Exploration as Multi-Objective RL [29.124322674133]
Intrinsic motivation enables reinforcement learning (RL) agents to explore when rewards are very sparse.
We propose a framework based on multi-objective RL where both exploration and exploitation are being optimized as separate objectives.
This formulation brings the balance between exploration and exploitation at a policy level, resulting in advantages over traditional methods.
arXiv Detail & Related papers (2020-04-06T02:37:29Z) - Exploring Unknown States with Action Balance [48.330318997735574]
Exploration is a key problem in reinforcement learning.
Next-state bonus methods force the agent to pay overmuch attention in exploring known states.
We propose action balance exploration, which balances the frequency of selecting each action at a given state.
arXiv Detail & Related papers (2020-03-10T03:32:28Z) - Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies.
A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z) - Long-Term Visitation Value for Deep Exploration in Sparse Reward
Reinforcement Learning [34.38011902445557]
Reinforcement learning with sparse rewards is still an open challenge.
We present a novel approach that plans exploration actions far into the future by using a long-term visitation count.
Contrary to existing methods which use models of reward and dynamics, our approach is off-policy and model-free.
arXiv Detail & Related papers (2020-01-01T01:01:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.