Sparse Reward Exploration via Novelty Search and Emitters
- URL: http://arxiv.org/abs/2102.03140v1
- Date: Fri, 5 Feb 2021 12:34:54 GMT
- Title: Sparse Reward Exploration via Novelty Search and Emitters
- Authors: Giuseppe Paolo (1 and 2), Alexandre Coninx (1), Stephane Doncieux (1),
Alban Laflaqui\`ere (2) ((1) ISIR, (2) SBRE)
- Abstract summary: We introduce the SparsE Reward Exploration via Novelty and Emitters (SERENE) algorithm.
SERENE separates the search space exploration and reward exploitation into two alternating processes.
A meta-scheduler allocates a global computational budget by alternating between the two processes.
- Score: 55.41644538483948
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reward-based optimization algorithms require both exploration, to find
rewards, and exploitation, to maximize performance. The need for efficient
exploration is even more significant in sparse reward settings, in which
performance feedback is given sparingly, thus rendering it unsuitable for
guiding the search process. In this work, we introduce the SparsE Reward
Exploration via Novelty and Emitters (SERENE) algorithm, capable of efficiently
exploring a search space, as well as optimizing rewards found in potentially
disparate areas. Contrary to existing emitters-based approaches, SERENE
separates the search space exploration and reward exploitation into two
alternating processes. The first process performs exploration through Novelty
Search, a divergent search algorithm. The second one exploits discovered reward
areas through emitters, i.e. local instances of population-based optimization
algorithms. A meta-scheduler allocates a global computational budget by
alternating between the two processes, ensuring the discovery and efficient
exploitation of disjoint reward areas. SERENE returns both a collection of
diverse solutions covering the search space and a collection of high-performing
solutions for each distinct reward area. We evaluate SERENE on various sparse
reward environments and show it compares favorably to existing baselines.
Related papers
- Searching a High-Performance Feature Extractor for Text Recognition
Network [92.12492627169108]
We design a domain-specific search space by exploring principles for having good feature extractors.
As the space is huge and complexly structured, no existing NAS algorithms can be applied.
We propose a two-stage algorithm to effectively search in the space.
arXiv Detail & Related papers (2022-09-27T03:49:04Z) - k-Means Maximum Entropy Exploration [55.81894038654918]
Exploration in continuous spaces with sparse rewards is an open problem in reinforcement learning.
We introduce an artificial curiosity algorithm based on lower bounding an approximation to the entropy of the state visitation distribution.
We show that our approach is both computationally efficient and competitive on benchmarks for exploration in high-dimensional, continuous spaces.
arXiv Detail & Related papers (2022-05-31T09:05:58Z) - Learning in Sparse Rewards settings through Quality-Diversity algorithms [1.4881159885040784]
This thesis focuses on the problem of sparse rewards with Quality-Diversity (QD) algorithms.
The first part of the thesis focuses on learning a representation of the space in which the diversity of the policies is evaluated.
The thesis continues with the introduction of the SERENE algorithm, a method that can efficiently focus on the interesting parts of the search space.
arXiv Detail & Related papers (2022-03-02T11:02:34Z) - Discovering and Exploiting Sparse Rewards in a Learned Behavior Space [0.46736439782713946]
Learning optimal policies in sparse rewards settings is difficult as the learning agent has little to no feedback on the quality of its actions.
We introduce STAX, an algorithm designed to learn a behavior space on-the-fly and to explore it while efficiently optimizing any reward discovered.
arXiv Detail & Related papers (2021-11-02T22:21:11Z) - On Reward-Free RL with Kernel and Neural Function Approximations:
Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function.
We tackle this problem under the context of function approximation, leveraging powerful function approximators.
We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z) - MADE: Exploration via Maximizing Deviation from Explored Regions [48.49228309729319]
In online reinforcement learning (RL), efficient exploration remains challenging in high-dimensional environments with sparse rewards.
We propose a new exploration approach via textitmaximizing the deviation of the occupancy of the next policy from the explored regions.
Our approach significantly improves sample efficiency over state-of-the-art methods.
arXiv Detail & Related papers (2021-06-18T17:57:00Z) - Exploration in two-stage recommender systems [79.50534282841618]
Two-stage recommender systems are widely adopted in industry due to their scalability and maintainability.
A key challenge of this setup is that optimal performance of each stage in isolation does not imply optimal global performance.
We propose a method of synchronising the exploration strategies between the ranker and the nominators.
arXiv Detail & Related papers (2020-09-01T16:52:51Z) - Dynamic Subgoal-based Exploration via Bayesian Optimization [7.297146495243708]
Reinforcement learning in sparse-reward navigation environments is challenging and poses a need for effective exploration.
We propose a cost-aware Bayesian optimization approach that efficiently searches over a class of dynamic subgoal-based exploration strategies.
An experimental evaluation demonstrates that the new approach outperforms existing baselines across a number of problem domains.
arXiv Detail & Related papers (2019-10-21T04:24:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.