Bisimulation Makes Analogies in Goal-Conditioned Reinforcement Learning
- URL: http://arxiv.org/abs/2204.13060v2
- Date: Thu, 28 Apr 2022 02:51:17 GMT
- Title: Bisimulation Makes Analogies in Goal-Conditioned Reinforcement Learning
- Authors: Philippe Hansen-Estruch, Amy Zhang, Ashvin Nair, Patrick Yin, Sergey
Levine
- Abstract summary: Building generalizable goal-conditioned agents from rich observations is a key to reinforcement learning (RL) solving real world problems.
We propose a new form of state abstraction called goal-conditioned bisimulation.
We learn this representation using a metric form of this abstraction, and show its ability to generalize to new goals in simulation manipulation tasks.
- Score: 71.52722621691365
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Building generalizable goal-conditioned agents from rich observations is a
key to reinforcement learning (RL) solving real world problems. Traditionally
in goal-conditioned RL, an agent is provided with the exact goal they intend to
reach. However, it is often not realistic to know the configuration of the goal
before performing a task. A more scalable framework would allow us to provide
the agent with an example of an analogous task, and have the agent then infer
what the goal should be for its current state. We propose a new form of state
abstraction called goal-conditioned bisimulation that captures functional
equivariance, allowing for the reuse of skills to achieve new goals. We learn
this representation using a metric form of this abstraction, and show its
ability to generalize to new goals in simulation manipulation tasks. Further,
we prove that this learned representation is sufficient not only for goal
conditioned tasks, but is amenable to any downstream task described by a
state-only reward function. Videos can be found at
https://sites.google.com/view/gc-bisimulation.
Related papers
- HIQL: Offline Goal-Conditioned RL with Latent States as Actions [81.67963770528753]
We propose a hierarchical algorithm for goal-conditioned RL from offline data.
We show how this hierarchical decomposition makes our method robust to noise in the estimated value function.
Our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data.
arXiv Detail & Related papers (2023-07-22T00:17:36Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Goal-Conditioned Q-Learning as Knowledge Distillation [136.79415677706612]
We explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation.
We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional.
We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals.
arXiv Detail & Related papers (2022-08-28T22:01:10Z) - Learning user-defined sub-goals using memory editing in reinforcement
learning [0.0]
The aim of reinforcement learning (RL) is to allow the agent to achieve the final goal.
I propose a methodology to achieve the user-defined sub-goals as well as the final goal using memory editing.
I expect that this methodology can be used in the fields that need to control the agent in a variety of scenarios.
arXiv Detail & Related papers (2022-05-01T05:19:51Z) - Learning for Visual Navigation by Imagining the Success [66.99810227193196]
We propose to learn to imagine a latent representation of the successful (sub-)goal state.
ForeSIT is trained to imagine the recurrent latent representation of a future state that leads to success.
We develop an efficient learning algorithm to train ForeSIT in an on-policy manner and integrate it into our RL objective.
arXiv Detail & Related papers (2021-02-28T10:25:46Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.