Dynamics-Aware Quality-Diversity for Efficient Learning of Skill
Repertoires
- URL: http://arxiv.org/abs/2109.08522v1
- Date: Thu, 16 Sep 2021 08:35:35 GMT
- Title: Dynamics-Aware Quality-Diversity for Efficient Learning of Skill
Repertoires
- Authors: Bryan Lim, Luca Grillotti, Lorenzo Bernasconi and Antoine Cully
- Abstract summary: Quality-Diversity (QD) algorithms are powerful exploration algorithms that allow robots to discover large repertoires of diverse and high-performing skills.
We propose Dynamics-Aware Quality-Diversity (DA-QD), a framework to improve the sample efficiency of QD algorithms.
- Score: 4.943054375935878
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quality-Diversity (QD) algorithms are powerful exploration algorithms that
allow robots to discover large repertoires of diverse and high-performing
skills. However, QD algorithms are sample inefficient and require millions of
evaluations. In this paper, we propose Dynamics-Aware Quality-Diversity
(DA-QD), a framework to improve the sample efficiency of QD algorithms through
the use of dynamics models. We also show how DA-QD can then be used for
continual acquisition of new skill repertoires. To do so, we incrementally
train a deep dynamics model from experience obtained when performing skill
discovery using QD. We can then perform QD exploration in imagination with an
imagined skill repertoire. We evaluate our approach on three robotic
experiments. First, our experiments show DA-QD is 20 times more sample
efficient than existing QD approaches for skill discovery. Second, we
demonstrate learning an entirely new skill repertoire in imagination to perform
zero-shot learning. Finally, we show how DA-QD is useful and effective for
solving a long horizon navigation task and for damage adaptation in the real
world. Videos and source code are available at:
https://sites.google.com/view/da-qd.
Related papers
- Affordance-Guided Reinforcement Learning via Visual Prompting [51.361977466993345]
Keypoint-based Affordance Guidance for Improvements (KAGI) is a method leveraging rewards shaped by vision-language models (VLMs) for autonomous RL.
On real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enables successful task completion in 20K online fine-tuning steps.
arXiv Detail & Related papers (2024-07-14T21:41:29Z) - Choreographer: Learning and Adapting Skills in Imagination [60.09911483010824]
We present Choreographer, a model-based agent that exploits its world model to learn and adapt skills in imagination.
Our method decouples the exploration and skill learning processes, being able to discover skills in the latent state space of the model.
Choreographer is able to learn skills both from offline data, and by collecting data simultaneously with an exploration policy.
arXiv Detail & Related papers (2022-11-23T23:31:14Z) - Residual Skill Policies: Learning an Adaptable Skill-based Action Space
for Reinforcement Learning for Robotics [18.546688182454236]
Skill-based reinforcement learning (RL) has emerged as a promising strategy to leverage prior knowledge for accelerated robot learning.
We propose accelerating exploration in the skill space using state-conditioned generative models.
We validate our approach across four challenging manipulation tasks, demonstrating our ability to learn across task variations.
arXiv Detail & Related papers (2022-11-04T02:42:17Z) - Learning to Walk Autonomously via Reset-Free Quality-Diversity [73.08073762433376]
Quality-Diversity algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills.
Existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions.
This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments.
arXiv Detail & Related papers (2022-04-07T14:07:51Z) - Hierarchical Skills for Efficient Exploration [70.62309286348057]
In reinforcement learning, pre-trained low-level skills have the potential to greatly facilitate exploration.
Prior knowledge of the downstream task is required to strike the right balance between generality (fine-grained control) and specificity (faster learning) in skill design.
We propose a hierarchical skill learning framework that acquires skills of varying complexity in an unsupervised manner.
arXiv Detail & Related papers (2021-10-20T22:29:32Z) - Example-Driven Model-Based Reinforcement Learning for Solving
Long-Horizon Visuomotor Tasks [85.56153200251713]
We introduce EMBR, a model-based RL method for learning primitive skills that are suitable for completing long-horizon visuomotor tasks.
On a Franka Emika robot arm, we find that EMBR enables the robot to complete three long-horizon visuomotor tasks at 85% success rate.
arXiv Detail & Related papers (2021-09-21T16:48:07Z) - An Improved Algorithm of Robot Path Planning in Complex Environment
Based on Double DQN [4.161177874372099]
This paper proposes an improved Double DQN (DDQN) to solve the problem by reference to A* and Rapidly-Exploring Random Tree (RRT)
The simulation experimental results validate the efficiency of the improved DDQN.
arXiv Detail & Related papers (2021-07-23T14:03:04Z) - Model-Based Quality-Diversity Search for Efficient Robot Learning [28.049034339935933]
novelty based Quality-Diversity(QD) algorithm.
Network is trained concurrently to the repertoire and is used to avoid executing unpromising actions in the novelty search process.
Experiments show that enhancing a QD algorithm with such a forward model improves the sample-efficiency and performance of the evolutionary process and the skill adaptation.
arXiv Detail & Related papers (2020-08-11T09:02:18Z) - Emergent Real-World Robotic Skills via Unsupervised Off-Policy
Reinforcement Learning [81.12201426668894]
We develop efficient reinforcement learning methods that acquire diverse skills without any reward function, and then repurpose these skills for downstream tasks.
We show that our proposed algorithm provides substantial improvement in learning efficiency, making reward-free real-world training feasible.
We also demonstrate that the learned skills can be composed using model predictive control for goal-oriented navigation, without any additional training.
arXiv Detail & Related papers (2020-04-27T17:38:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.