Direct then Diffuse: Incremental Unsupervised Skill Discovery for State
Covering and Goal Reaching
- URL: http://arxiv.org/abs/2110.14457v1
- Date: Wed, 27 Oct 2021 14:22:19 GMT
- Title: Direct then Diffuse: Incremental Unsupervised Skill Discovery for State
Covering and Goal Reaching
- Authors: Pierre-Alexandre Kamienny, Jean Tarbouriech, Alessandro Lazaric,
Ludovic Denoyer
- Abstract summary: We build on the mutual information framework for skill discovery and introduce UPSIDE to address the coverage-directedness trade-off.
We illustrate in several navigation and control environments how the skills learned by UPSIDE solve sparse-reward downstream tasks better than existing baselines.
- Score: 98.25207998996066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning meaningful behaviors in the absence of reward is a difficult problem
in reinforcement learning. A desirable and challenging unsupervised objective
is to learn a set of diverse skills that provide a thorough coverage of the
state space while being directed, i.e., reliably reaching distinct regions of
the environment. In this paper, we build on the mutual information framework
for skill discovery and introduce UPSIDE, which addresses the
coverage-directedness trade-off in the following ways: 1) We design policies
with a decoupled structure of a directed skill, trained to reach a specific
region, followed by a diffusing part that induces a local coverage. 2) We
optimize policies by maximizing their number under the constraint that each of
them reaches distinct regions of the environment (i.e., they are sufficiently
discriminable) and prove that this serves as a lower bound to the original
mutual information objective. 3) Finally, we compose the learned directed
skills into a growing tree that adaptively covers the environment. We
illustrate in several navigation and control environments how the skills
learned by UPSIDE solve sparse-reward downstream tasks better than existing
baselines.
Related papers
- SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions [48.003320766433966]
This work introduces Skill Discovery from Local Dependencies (Skild)
Skild develops a novel skill learning objective that explicitly encourages the mastering of skills that induce different interactions within an environment.
We evaluate Skild in several domains with challenging, long-horizon sparse reward tasks including a realistic simulated household robot domain.
arXiv Detail & Related papers (2024-10-24T04:01:59Z) - Diversity Through Exclusion (DTE): Niche Identification for
Reinforcement Learning through Value-Decomposition [63.67574523750839]
We propose a generic reinforcement learning (RL) algorithm that performs better than baseline deep Q-learning algorithms in environments with multiple variably-valued niches.
We show that agents trained this way can escape poor-but-attractive local optima to instead converge to harder-to-discover higher value strategies.
arXiv Detail & Related papers (2023-02-02T16:00:19Z) - A Unified Algorithm Framework for Unsupervised Discovery of Skills based
on Determinantal Point Process [53.86223883060367]
We show that diversity and coverage in unsupervised option discovery can indeed be unified under the same mathematical framework.
Our proposed algorithm, ODPP, has undergone extensive evaluation on challenging tasks created with Mujoco and Atari.
arXiv Detail & Related papers (2022-12-01T01:40:03Z) - Choreographer: Learning and Adapting Skills in Imagination [60.09911483010824]
We present Choreographer, a model-based agent that exploits its world model to learn and adapt skills in imagination.
Our method decouples the exploration and skill learning processes, being able to discover skills in the latent state space of the model.
Choreographer is able to learn skills both from offline data, and by collecting data simultaneously with an exploration policy.
arXiv Detail & Related papers (2022-11-23T23:31:14Z) - Wasserstein Unsupervised Reinforcement Learning [29.895142928565228]
Unsupervised reinforcement learning aims to train agents to learn a handful of policies or skills in environments without external reward.
These pre-trained policies can accelerate latent learning when endowed with external reward, and can also be used as primitive options in hierarchical reinforcement learning.
We propose a new framework Wasserstein unsupervised reinforcement learning (WURL) where we directly maximize the distance of state distributions induced by different policies.
arXiv Detail & Related papers (2021-10-15T08:41:51Z) - DisTop: Discovering a Topological representation to learn diverse and
rewarding skills [0.0]
DisTop is a new model that simultaneously learns diverse skills and focuses on improving rewarding skills.
DisTop builds a discrete topology of the environment using an unsupervised contrastive loss, a growing network and a goal-conditioned policy.
We show that DisTop achieves state-of-the-art performance in comparison with hierarchical reinforcement learning (HRL) when rewards are sparse.
arXiv Detail & Related papers (2021-06-06T10:09:05Z) - GRIMGEP: Learning Progress for Robust Goal Sampling in Visual Deep
Reinforcement Learning [21.661530291654692]
We propose a framework that allows agents to autonomously identify and ignore noisy distracting regions.
Our framework can be combined with any state-of-the-art novelty seeking goal exploration approaches.
arXiv Detail & Related papers (2020-08-10T19:50:06Z) - ELSIM: End-to-end learning of reusable skills through intrinsic
motivation [0.0]
We present a novel reinforcement learning architecture which hierarchically learns and represents self-generated skills in an end-to-end way.
With this architecture, an agent focuses only on task-rewarded skills while keeping the learning process of skills bottom-up.
arXiv Detail & Related papers (2020-06-23T11:20:46Z) - Explore, Discover and Learn: Unsupervised Discovery of State-Covering
Skills [155.11646755470582]
'Explore, Discover and Learn' (EDL) is an alternative approach to information-theoretic skill discovery.
We show that EDL offers significant advantages, such as overcoming the coverage problem, reducing the dependence of learned skills on the initial state, and allowing the user to define a prior over which behaviors should be learned.
arXiv Detail & Related papers (2020-02-10T10:49:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.