Related papers: Exploring Unknown States with Action Balance

Exploring Unknown States with Action Balance

URL: http://arxiv.org/abs/2003.04518v2
Date: Tue, 1 Sep 2020 07:11:52 GMT
Title: Exploring Unknown States with Action Balance
Authors: Yan Song, Yingfeng Chen, Yujing Hu, Changjie Fan
Abstract summary: Exploration is a key problem in reinforcement learning. Next-state bonus methods force the agent to pay overmuch attention in exploring known states. We propose action balance exploration, which balances the frequency of selecting each action at a given state.
Score: 48.330318997735574
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Exploration is a key problem in reinforcement learning. Recently bonus-based methods have achieved considerable successes in environments where exploration is difficult such as Montezuma's Revenge, which assign additional bonuses (e.g., intrinsic rewards) to guide the agent to rarely visited states. Since the bonus is calculated according to the novelty of the next state after performing an action, we call such methods as the next-state bonus methods. However, the next-state bonus methods force the agent to pay overmuch attention in exploring known states and ignore finding unknown states since the exploration is driven by the next state already visited, which may slow the pace of finding reward in some environments. In this paper, we focus on improving the effectiveness of finding unknown states and propose action balance exploration, which balances the frequency of selecting each action at a given state and can be treated as an extension of upper confidence bound (UCB) to deep reinforcement learning. Moreover, we propose action balance RND that combines the next-state bonus methods (e.g., random network distillation exploration, RND) and our action balance exploration to take advantage of both sides. The experiments on the grid world and Atari games demonstrate action balance exploration has a better capability in finding unknown states and can improve the performance of RND in some hard exploration environments respectively.

Related papers

MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards. We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration. We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z)
Reward Augmentation in Reinforcement Learning for Testing Distributed Systems [6.0560257343687995]
Bugs in popular distributed protocol implementations have been the source of many downtimes in popular internet services. We describe a randomized testing approach for distributed protocol implementations based on reinforcement learning. We show two different techniques that build on one another.
arXiv Detail & Related papers (2024-09-02T15:07:05Z)
Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration [97.19464604735802]
A promising technique for exploration is to maximize the entropy of visited state distribution. It tends to struggle in a supervised setup with a task reward, where an agent prefers to visit high-value states. We present a novel exploration technique that maximizes the value-conditional state entropy.
arXiv Detail & Related papers (2023-05-31T01:09:28Z)
Neighboring state-based RL Exploration [1.5935205681539144]
We study neighboring state-based, model-free exploration led by the intuition that, for an early-stage agent, considering actions derived from a bounded region of nearby states may lead to better actions when exploring. We propose two algorithms that choose exploratory actions based on a survey of nearby states, and find that one of our methods, $rho$-explore, consistently outperforms the Double DQN baseline in an discrete environment by 49% in terms of Eval Reward Return.
arXiv Detail & Related papers (2022-12-21T01:23:53Z)
GAN-based Intrinsic Exploration For Sample Efficient Reinforcement Learning [0.0]
We propose a Geneversarative Adversarial Network-based Intrinsic Reward Module that learns the distribution of the observed states and sends an intrinsic reward that is computed as high for states that are out of distribution. We evaluate our approach in Super Mario Bros for a no reward setting and in Montezuma's Revenge for a sparse reward setting and show that our approach is indeed capable of exploring efficiently.
arXiv Detail & Related papers (2022-06-28T19:16:52Z)
SEREN: Knowing When to Explore and When to Exploit [14.188362393915432]
We introduce Sive Reinforcement Exploration Network (SEREN) that poses the exploration-exploitation trade-off as a game. Using a form of policies known as impulse control, switcher is able to determine the best set of states to switch to the exploration policy. We prove that SEREN converges quickly and induces a natural schedule towards pure exploitation.
arXiv Detail & Related papers (2022-05-30T12:44:56Z)
Generative Adversarial Exploration for Reinforcement Learning [48.379457575356454]
In this paper, we propose a novel method called generative adversarial exploration (GAEX) to encourage exploration in reinforcement learning (RL) In our experiments, we apply GAEX to the game Venture, Montezuma's Revenge and Super Mario Bros. To our knowledge, this is the first work to employ GAN in RL exploration problems.
arXiv Detail & Related papers (2022-01-27T17:34:47Z)
Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore) In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process. We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z)
Fast active learning for pure exploration in reinforcement learning [48.98199700043158]
We show that bonuses that scale with $1/n$ bring faster learning rates, improving the known upper bounds with respect to the dependence on the horizon. We also show that with an improved analysis of the stopping time, we can improve by a factor $H$ the sample complexity in the best-policy identification setting.
arXiv Detail & Related papers (2020-07-27T11:28:32Z)
Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies. We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies. A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.