Generative Adversarial Exploration for Reinforcement Learning
- URL: http://arxiv.org/abs/2201.11685v1
- Date: Thu, 27 Jan 2022 17:34:47 GMT
- Title: Generative Adversarial Exploration for Reinforcement Learning
- Authors: Weijun Hong, Menghui Zhu, Minghuan Liu, Weinan Zhang, Ming Zhou, Yong
Yu, Peng Sun
- Abstract summary: In this paper, we propose a novel method called generative adversarial exploration (GAEX) to encourage exploration in reinforcement learning (RL)
In our experiments, we apply GAEX to the game Venture, Montezuma's Revenge and Super Mario Bros.
To our knowledge, this is the first work to employ GAN in RL exploration problems.
- Score: 48.379457575356454
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exploration is crucial for training the optimal reinforcement learning (RL)
policy, where the key is to discriminate whether a state visiting is novel.
Most previous work focuses on designing heuristic rules or distance metrics to
check whether a state is novel without considering such a discrimination
process that can be learned. In this paper, we propose a novel method called
generative adversarial exploration (GAEX) to encourage exploration in RL via
introducing an intrinsic reward output from a generative adversarial network,
where the generator provides fake samples of states that help discriminator
identify those less frequently visited states. Thus the agent is encouraged to
visit those states which the discriminator is less confident to judge as
visited. GAEX is easy to implement and of high training efficiency. In our
experiments, we apply GAEX into DQN and the DQN-GAEX algorithm achieves
convincing performance on challenging exploration problems, including the game
Venture, Montezuma's Revenge and Super Mario Bros, without further fine-tuning
on complicate learning algorithms. To our knowledge, this is the first work to
employ GAN in RL exploration problems.
Related papers
- Neighboring state-based RL Exploration [1.5935205681539144]
We study neighboring state-based, model-free exploration led by the intuition that, for an early-stage agent, considering actions derived from a bounded region of nearby states may lead to better actions when exploring.
We propose two algorithms that choose exploratory actions based on a survey of nearby states, and find that one of our methods, $rho$-explore, consistently outperforms the Double DQN baseline in an discrete environment by 49% in terms of Eval Reward Return.
arXiv Detail & Related papers (2022-12-21T01:23:53Z) - First Go, then Post-Explore: the Benefits of Post-Exploration in
Intrinsic Motivation [7.021281655855703]
Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards.
Key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state.
We refer to such exploration after a goal is reached as 'post-exploration'
arXiv Detail & Related papers (2022-12-06T18:56:47Z) - SEREN: Knowing When to Explore and When to Exploit [14.188362393915432]
We introduce Sive Reinforcement Exploration Network (SEREN) that poses the exploration-exploitation trade-off as a game.
Using a form of policies known as impulse control, switcher is able to determine the best set of states to switch to the exploration policy.
We prove that SEREN converges quickly and induces a natural schedule towards pure exploitation.
arXiv Detail & Related papers (2022-05-30T12:44:56Z) - Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore)
In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process.
We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Exploration and Incentives in Reinforcement Learning [107.42240386544633]
We consider complex exploration problems, where each agent faces the same (but unknown) MDP.
Agents control the choice of policies, whereas an algorithm can only issue recommendations.
We design an algorithm which explores all reachable states in the MDP.
arXiv Detail & Related papers (2021-02-28T00:15:53Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - First return, then explore [18.876005532689234]
Go-Explore is a family of algorithms that explicitly remembers promising states and first returning to such states before intentionally exploring.
Go-Explore solves all heretofore unsolved Atari games and surpasses the state of the art on all hard-exploration games.
We show that adding a goal-conditioned policy can further improve Go-Explore's exploration efficiency and enable it to handleity throughout training.
arXiv Detail & Related papers (2020-04-27T16:31:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.