Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning
- URL: http://arxiv.org/abs/2211.00247v1
- Date: Tue, 1 Nov 2022 03:31:43 GMT
- Title: Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning
- Authors: Riashat Islam, Hongyu Zang, Anirudh Goyal, Alex Lamb, Kenji Kawaguchi,
Xin Li, Romain Laroche, Yoshua Bengio, Remi Tachet Des Combes
- Abstract summary: We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
- Score: 99.38163119531745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Goal-conditioned reinforcement learning (RL) is a promising direction for
training agents that are capable of solving multiple tasks and reach a diverse
set of objectives. How to \textit{specify} and \textit{ground} these goals in
such a way that we can both reliably reach goals during training as well as
generalize to new goals during evaluation remains an open area of research.
Defining goals in the space of noisy and high-dimensional sensory inputs poses
a challenge for training goal-conditioned agents, or even for generalization to
novel goals. We propose to address this by learning factorial representations
of goals and processing the resulting representation via a discretization
bottleneck, for coarser goal specification, through an approach we call DGRL.
We show that applying a discretizing bottleneck can improve performance in
goal-conditioned RL setups, by experimentally evaluating this method on tasks
ranging from maze environments to complex robotic navigation and manipulation.
Additionally, we prove a theorem lower-bounding the expected return on
out-of-distribution goals, while still allowing for specifying goals with
expressive combinatorial structure.
Related papers
- Goal-Conditioned Q-Learning as Knowledge Distillation [136.79415677706612]
We explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation.
We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional.
We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals.
arXiv Detail & Related papers (2022-08-28T22:01:10Z) - Bisimulation Makes Analogies in Goal-Conditioned Reinforcement Learning [71.52722621691365]
Building generalizable goal-conditioned agents from rich observations is a key to reinforcement learning (RL) solving real world problems.
We propose a new form of state abstraction called goal-conditioned bisimulation.
We learn this representation using a metric form of this abstraction, and show its ability to generalize to new goals in simulation manipulation tasks.
arXiv Detail & Related papers (2022-04-27T17:00:11Z) - Goal-Aware Cross-Entropy for Multi-Target Reinforcement Learning [15.33496710690063]
We propose goal-aware cross-entropy (GACE) loss, that can be utilized in a self-supervised way.
We then devise goal-discriminative attention networks (GDAN) which utilize the goal-relevant information to focus on the given instruction.
arXiv Detail & Related papers (2021-10-25T14:24:39Z) - C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation.
We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states.
E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z) - Adversarial Intrinsic Motivation for Reinforcement Learning [60.322878138199364]
We investigate whether the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution can be utilized effectively for reinforcement learning tasks.
Our approach, termed Adversarial Intrinsic Motivation (AIM), estimates this Wasserstein-1 distance through its dual objective and uses it to compute a supplemental reward function.
arXiv Detail & Related papers (2021-05-27T17:51:34Z) - Hierarchical and Partially Observable Goal-driven Policy Learning with
Goals Relational Graph [21.260858893505183]
We present a novel two-layer hierarchical learning approach equipped with a Goals Graph (GRG)
Our GRG captures the underlying relations of all goals in the goal space through a Dirichlet-categorical that process.
Our experimental results show that our approach exhibits superior generalization on both unseen environments and new goals.
arXiv Detail & Related papers (2021-03-01T23:21:46Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.