Efficient Exploration using Model-Based Quality-Diversity with Gradients
- URL: http://arxiv.org/abs/2211.12610v1
- Date: Tue, 22 Nov 2022 22:19:01 GMT
- Title: Efficient Exploration using Model-Based Quality-Diversity with Gradients
- Authors: Bryan Lim, Manon Flageat, Antoine Cully
- Abstract summary: In this paper, we propose a model-based Quality-Diversity approach.
It extends existing QD methods to use gradients for efficient exploitation and leverage perturbations in imagination for efficient exploration.
We demonstrate that it maintains the divergent search capabilities of population-based approaches on tasks with deceptive rewards while significantly improving their sample efficiency and quality of solutions.
- Score: 4.788163807490196
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Exploration is a key challenge in Reinforcement Learning, especially in
long-horizon, deceptive and sparse-reward environments. For such applications,
population-based approaches have proven effective. Methods such as
Quality-Diversity deals with this by encouraging novel solutions and producing
a diversity of behaviours. However, these methods are driven by either
undirected sampling (i.e. mutations) or use approximated gradients (i.e.
Evolution Strategies) in the parameter space, which makes them highly
sample-inefficient. In this paper, we propose a model-based Quality-Diversity
approach. It extends existing QD methods to use gradients for efficient
exploitation and leverage perturbations in imagination for efficient
exploration. Our approach optimizes all members of a population simultaneously
to maintain both performance and diversity efficiently by leveraging the
effectiveness of QD algorithms as good data generators to train deep models. We
demonstrate that it maintains the divergent search capabilities of
population-based approaches on tasks with deceptive rewards while significantly
improving their sample efficiency and quality of solutions.
Related papers
- Adaptive teachers for amortized samplers [76.88721198565861]
Amortized inference is the task of training a parametric model, such as a neural network, to approximate a distribution with a given unnormalized density where exact sampling is intractable.
Off-policy RL training facilitates the discovery of diverse, high-reward candidates, but existing methods still face challenges in efficient exploration.
We propose an adaptive training distribution (the Teacher) to guide the training of the primary amortized sampler (the Student) by prioritizing high-loss regions.
arXiv Detail & Related papers (2024-10-02T11:33:13Z) - On the Robustness of Fully-Spiking Neural Networks in Open-World Scenarios using Forward-Only Learning Algorithms [6.7236795813629]
We develop a novel algorithm for Out-of-Distribution (OoD) detection using the Forward-Forward Algorithm (FFA)
Our approach measures the likelihood of a sample belonging to the in-distribution (ID) data by using the distance from the latent representation of samples to class-representative manifold.
We also propose a gradient-free attribution technique that highlights the features of a sample pushing it away from the distribution of any class.
arXiv Detail & Related papers (2024-07-19T08:08:17Z) - Learning Better with Less: Effective Augmentation for Sample-Efficient
Visual Reinforcement Learning [57.83232242068982]
Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms.
It remains unclear which attributes of DA account for its effectiveness in achieving sample-efficient visual RL.
This work conducts comprehensive experiments to assess the impact of DA's attributes on its efficacy.
arXiv Detail & Related papers (2023-05-25T15:46:20Z) - Efficient Methods for Natural Language Processing: A Survey [76.34572727185896]
This survey synthesizes and relates current methods and findings in efficient NLP.
We aim to provide both guidance for conducting NLP under limited resources, and point towards promising research directions for developing more efficient methods.
arXiv Detail & Related papers (2022-08-31T20:32:35Z) - Sample-Efficient, Exploration-Based Policy Optimisation for Routing
Problems [2.6782615615913348]
This paper presents a new reinforcement learning approach that is based on entropy.
In addition, we design an off-policy-based reinforcement learning technique that maximises the expected return.
We show that our model can generalise to various route problems.
arXiv Detail & Related papers (2022-05-31T09:51:48Z) - Few-shot Quality-Diversity Optimization [50.337225556491774]
Quality-Diversity (QD) optimization has been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning.
We show that, given examples from a task distribution, information about the paths taken by optimization in parameter space can be leveraged to build a prior population, which when used to initialize QD methods in unseen environments, allows for few-shot adaptation.
Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimization in these environments.
arXiv Detail & Related papers (2021-09-14T17:12:20Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Fast and stable MAP-Elites in noisy domains using deep grids [1.827510863075184]
Deep-Grid MAP-Elites is a variant of the MAP-Elites algorithm that uses an archive of similar previously encountered solutions to approximate the performance of a solution.
We show that this simple approach is significantly more resilient to noise on the behavioural descriptors, while achieving competitive performances in terms of fitness optimisation.
arXiv Detail & Related papers (2020-06-25T08:47:23Z) - Diversity Policy Gradient for Sample Efficient Quality-Diversity
Optimization [7.8499505363825755]
Aiming for diversity in addition to performance is a convenient way to deal with the exploration-exploitation trade-off.
This paper proposes a novel algorithm, QDPG, which combines the strength of Policy Gradient algorithms and Quality Diversity approaches.
arXiv Detail & Related papers (2020-06-15T16:04:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.