DisTop: Discovering a Topological representation to learn diverse and
rewarding skills
- URL: http://arxiv.org/abs/2106.03853v1
- Date: Sun, 6 Jun 2021 10:09:05 GMT
- Title: DisTop: Discovering a Topological representation to learn diverse and
rewarding skills
- Authors: Arthur Aubret, Laetitia matignon and Salima Hassas
- Abstract summary: DisTop is a new model that simultaneously learns diverse skills and focuses on improving rewarding skills.
DisTop builds a discrete topology of the environment using an unsupervised contrastive loss, a growing network and a goal-conditioned policy.
We show that DisTop achieves state-of-the-art performance in comparison with hierarchical reinforcement learning (HRL) when rewards are sparse.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The optimal way for a deep reinforcement learning (DRL) agent to explore is
to learn a set of skills that achieves a uniform distribution of states.
Following this,we introduce DisTop, a new model that simultaneously learns
diverse skills and focuses on improving rewarding skills. DisTop progressively
builds a discrete topology of the environment using an unsupervised contrastive
loss, a growing network and a goal-conditioned policy. Using this topology, a
state-independent hierarchical policy can select where the agent has to keep
discovering skills in the state space. In turn, the newly visited states allows
an improved learnt representation and the learning loop continues. Our
experiments emphasize that DisTop is agnostic to the ground state
representation and that the agent can discover the topology of its environment
whether the states are high-dimensional binary data, images, or proprioceptive
inputs. We demonstrate that this paradigm is competitiveon MuJoCo benchmarks
with state-of-the-art algorithms on both single-task dense rewards and diverse
skill discovery. By combining these two aspects, we showthat DisTop achieves
state-of-the-art performance in comparison with hierarchical reinforcement
learning (HRL) when rewards are sparse. We believe DisTop opens new
perspectives by showing that bottom-up skill discovery combined with
representation learning can unlock the exploration challenge in DRL.
Related papers
- Constrained Ensemble Exploration for Unsupervised Skill Discovery [43.00837365639085]
Unsupervised Reinforcement Learning (RL) provides a promising paradigm for learning useful behaviors via reward-free per-training.
We propose a novel unsupervised RL framework via an ensemble of skills, where each skill performs partition exploration based on the state prototypes.
We find our method learns well-explored ensemble skills and achieves superior performance in various downstream tasks compared to previous methods.
arXiv Detail & Related papers (2024-05-25T03:07:56Z) - Choreographer: Learning and Adapting Skills in Imagination [60.09911483010824]
We present Choreographer, a model-based agent that exploits its world model to learn and adapt skills in imagination.
Our method decouples the exploration and skill learning processes, being able to discover skills in the latent state space of the model.
Choreographer is able to learn skills both from offline data, and by collecting data simultaneously with an exploration policy.
arXiv Detail & Related papers (2022-11-23T23:31:14Z) - Visual processing in context of reinforcement learning [0.0]
This thesis introduces three different representation learning algorithms that have access to different subsets of the data sources that traditional RL algorithms use.
We conclude that including unsupervised representation learning in RL problem-solving pipelines can speed up learning.
arXiv Detail & Related papers (2022-08-26T09:30:51Z) - Temporal Abstractions-Augmented Temporally Contrastive Learning: An
Alternative to the Laplacian in RL [140.12803111221206]
In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting.
We propose an alternative method that is able to recover, in a non-uniform-prior setting, the expressiveness and the desired properties of the Laplacian representation.
We find that our method succeeds as an alternative to the Laplacian in the non-uniform setting and scales to challenging continuous control environments.
arXiv Detail & Related papers (2022-03-21T22:07:48Z) - JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical
Reinforcement Learning [13.57305458734617]
We propose JueWu-MC, a sample-efficient hierarchical RL approach equipped with representation learning and imitation learning to deal with perception and exploration.
Specifically, our approach includes two levels of hierarchy, where the high-level controller learns a policy to control over options and the low-level workers learn to solve each sub-task.
To boost the learning of sub-tasks, we propose a combination of techniques including 1) action-aware representation learning which captures underlying relations between action and representation, 2) discriminator-based self-imitation learning for efficient exploration, and 3) ensemble behavior cloning with consistency filtering for
arXiv Detail & Related papers (2021-12-07T09:24:49Z) - Direct then Diffuse: Incremental Unsupervised Skill Discovery for State
Covering and Goal Reaching [98.25207998996066]
We build on the mutual information framework for skill discovery and introduce UPSIDE to address the coverage-directedness trade-off.
We illustrate in several navigation and control environments how the skills learned by UPSIDE solve sparse-reward downstream tasks better than existing baselines.
arXiv Detail & Related papers (2021-10-27T14:22:19Z) - Exploratory State Representation Learning [63.942632088208505]
We propose a new approach called XSRL (eXploratory State Representation Learning) to solve the problems of exploration and SRL in parallel.
On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations.
On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the objective of a discovery policy.
arXiv Detail & Related papers (2021-09-28T10:11:07Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - ELSIM: End-to-end learning of reusable skills through intrinsic
motivation [0.0]
We present a novel reinforcement learning architecture which hierarchically learns and represents self-generated skills in an end-to-end way.
With this architecture, an agent focuses only on task-rewarded skills while keeping the learning process of skills bottom-up.
arXiv Detail & Related papers (2020-06-23T11:20:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.