Fast active learning for pure exploration in reinforcement learning
- URL: http://arxiv.org/abs/2007.13442v2
- Date: Sat, 10 Oct 2020 17:15:28 GMT
- Title: Fast active learning for pure exploration in reinforcement learning
- Authors: Pierre M\'enard, Omar Darwiche Domingues, Anders Jonsson, Emilie
Kaufmann, Edouard Leurent, Michal Valko
- Abstract summary: We show that bonuses that scale with $1/n$ bring faster learning rates, improving the known upper bounds with respect to the dependence on the horizon.
We also show that with an improved analysis of the stopping time, we can improve by a factor $H$ the sample complexity in the best-policy identification setting.
- Score: 48.98199700043158
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Realistic environments often provide agents with very limited feedback. When
the environment is initially unknown, the feedback, in the beginning, can be
completely absent, and the agents may first choose to devote all their effort
on exploring efficiently. The exploration remains a challenge while it has been
addressed with many hand-tuned heuristics with different levels of generality
on one side, and a few theoretically-backed exploration strategies on the
other. Many of them are incarnated by intrinsic motivation and in particular
explorations bonuses. A common rule of thumb for exploration bonuses is to use
$1/\sqrt{n}$ bonus that is added to the empirical estimates of the reward,
where $n$ is a number of times this particular state (or a state-action pair)
was visited. We show that, surprisingly, for a pure-exploration objective of
reward-free exploration, bonuses that scale with $1/n$ bring faster learning
rates, improving the known upper bounds with respect to the dependence on the
horizon $H$. Furthermore, we show that with an improved analysis of the
stopping time, we can improve by a factor $H$ the sample complexity in the
best-policy identification setting, which is another pure-exploration
objective, where the environment provides rewards but the agent is not
penalized for its behavior during the exploration phase.
Related papers
- Successor-Predecessor Intrinsic Exploration [18.440869985362998]
We focus on exploration with intrinsic rewards, where the agent transiently augments the external rewards with self-generated intrinsic rewards.
We propose Successor-Predecessor Intrinsic Exploration (SPIE), an exploration algorithm based on a novel intrinsic reward combining prospective and retrospective information.
We show that SPIE yields more efficient and ethologically plausible exploratory behaviour in environments with sparse rewards and bottleneck states than competing methods.
arXiv Detail & Related papers (2023-05-24T16:02:51Z) - DEIR: Efficient and Robust Exploration through
Discriminative-Model-Based Episodic Intrinsic Rewards [2.09711130126031]
Exploration is a fundamental aspect of reinforcement learning (RL), and its effectiveness is a deciding factor in the performance of RL algorithms.
Recent studies have shown the effectiveness of encouraging exploration with intrinsic rewards estimated from novelties in observations.
We propose DEIR, a novel method in which we theoretically derive an intrinsic reward with a conditional mutual information term.
arXiv Detail & Related papers (2023-04-21T06:39:38Z) - Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore)
In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process.
We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z) - Self-Supervised Exploration via Latent Bayesian Surprise [4.088019409160893]
In this work, we propose a curiosity-based bonus as intrinsic reward for Reinforcement Learning.
We extensively evaluate our model by measuring the agent's performance in terms of environment exploration.
Our model is cheap and empirically shows state-of-the-art performance on several problems.
arXiv Detail & Related papers (2021-04-15T14:40:16Z) - Rank the Episodes: A Simple Approach for Exploration in
Procedurally-Generated Environments [66.80667987347151]
Methods based on intrinsic rewards often fall short in procedurally-generated environments.
We introduce RAPID, a simple yet effective episode-level exploration method for procedurally-generated environments.
We demonstrate our method on several procedurally-generated MiniGrid environments, a first-person-view 3D Maze navigation task from MiniWorld, and several sparse MuJoCo tasks.
arXiv Detail & Related papers (2021-01-20T14:22:01Z) - BeBold: Exploration Beyond the Boundary of Explored Regions [66.88415950549556]
In this paper, we propose the regulated difference of inverse visitation counts as a simple but effective criterion for intrinsic reward (IR)
The criterion helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment.
The resulting method, BeBold, solves the 12 most challenging procedurally-generated tasks in MiniGrid with just 120M environment steps, without any curriculum learning.
arXiv Detail & Related papers (2020-12-15T21:26:54Z) - Exploring Unknown States with Action Balance [48.330318997735574]
Exploration is a key problem in reinforcement learning.
Next-state bonus methods force the agent to pay overmuch attention in exploring known states.
We propose action balance exploration, which balances the frequency of selecting each action at a given state.
arXiv Detail & Related papers (2020-03-10T03:32:28Z) - RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated
Environments [15.736899098702972]
We propose a novel type of intrinsic reward which encourages the agent to take actions that lead to significant changes in its learned state representation.
We evaluate our method on multiple challenging procedurally-generated tasks in MiniGrid.
arXiv Detail & Related papers (2020-02-27T18:03:16Z) - Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies.
A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z) - Long-Term Visitation Value for Deep Exploration in Sparse Reward
Reinforcement Learning [34.38011902445557]
Reinforcement learning with sparse rewards is still an open challenge.
We present a novel approach that plans exploration actions far into the future by using a long-term visitation count.
Contrary to existing methods which use models of reward and dynamics, our approach is off-policy and model-free.
arXiv Detail & Related papers (2020-01-01T01:01:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.