Temporal Abstractions-Augmented Temporally Contrastive Learning: An
Alternative to the Laplacian in RL
- URL: http://arxiv.org/abs/2203.11369v1
- Date: Mon, 21 Mar 2022 22:07:48 GMT
- Title: Temporal Abstractions-Augmented Temporally Contrastive Learning: An
Alternative to the Laplacian in RL
- Authors: Akram Erraqabi, Marlos C. Machado, Mingde Zhao, Sainbayar Sukhbaatar,
Alessandro Lazaric, Ludovic Denoyer, Yoshua Bengio
- Abstract summary: In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting.
We propose an alternative method that is able to recover, in a non-uniform-prior setting, the expressiveness and the desired properties of the Laplacian representation.
We find that our method succeeds as an alternative to the Laplacian in the non-uniform setting and scales to challenging continuous control environments.
- Score: 140.12803111221206
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In reinforcement learning, the graph Laplacian has proved to be a valuable
tool in the task-agnostic setting, with applications ranging from skill
discovery to reward shaping. Recently, learning the Laplacian representation
has been framed as the optimization of a temporally-contrastive objective to
overcome its computational limitations in large (or continuous) state spaces.
However, this approach requires uniform access to all states in the state
space, overlooking the exploration problem that emerges during the
representation learning process. In this work, we propose an alternative method
that is able to recover, in a non-uniform-prior setting, the expressiveness and
the desired properties of the Laplacian representation. We do so by combining
the representation learning with a skill-based covering policy, which provides
a better training distribution to extend and refine the representation. We also
show that a simple augmentation of the representation objective with the
learned temporal abstractions improves dynamics-awareness and helps
exploration. We find that our method succeeds as an alternative to the
Laplacian in the non-uniform setting and scales to challenging continuous
control environments. Finally, even if our method is not optimized for skill
discovery, the learned skills can successfully solve difficult continuous
navigation tasks with sparse rewards, where standard skill discovery approaches
are no so effective.
Related papers
- RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Discrete State-Action Abstraction via the Successor Representation [3.453310639983932]
Abstraction is one approach that provides the agent with an intrinsic reward for transitioning in a latent space.
Our approach is the first for automatically learning a discrete abstraction of the underlying environment.
Our proposed algorithm, Discrete State-Action Abstraction (DSAA), iteratively swaps between training these options and using them to efficiently explore more of the environment.
arXiv Detail & Related papers (2022-06-07T17:37:30Z) - Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon
Reasoning [120.38381203153159]
Reinforcement learning can train policies that effectively perform complex tasks.
For long-horizon tasks, the performance of these methods degrades with horizon, often necessitating reasoning over and composing lower-level skills.
We propose Value Function Spaces: a simple approach that produces such a representation by using the value functions corresponding to each lower-level skill.
arXiv Detail & Related papers (2021-11-04T22:46:16Z) - Exploratory State Representation Learning [63.942632088208505]
We propose a new approach called XSRL (eXploratory State Representation Learning) to solve the problems of exploration and SRL in parallel.
On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations.
On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the objective of a discovery policy.
arXiv Detail & Related papers (2021-09-28T10:11:07Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning.
We propose 'Adaptive Insubordination' (ADVISOR) to address this gap.
ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z) - oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally
Extended Actions [37.66289166905027]
Explicit engineering of reward functions for given environments has been a major hindrance to reinforcement learning methods.
We propose an algorithm that learns hierarchical disentangled rewards with a policy over options.
arXiv Detail & Related papers (2020-02-20T22:21:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.