Understanding Hindsight Goal Relabeling Requires Rethinking Divergence
Minimization
- URL: http://arxiv.org/abs/2209.13046v1
- Date: Mon, 26 Sep 2022 22:00:27 GMT
- Title: Understanding Hindsight Goal Relabeling Requires Rethinking Divergence
Minimization
- Authors: Lunjun Zhang, Bradly C. Stadie
- Abstract summary: Hindsight goal relabeling has become a foundational technique for multi-goal reinforcement learning (RL)
In this work, we develop a unified objective for goal-reaching that explains such a connection.
We find that despite recent advances in goal-conditioned behaviour cloning, multi-goal Q-learning can still outperform BC-like methods.
- Score: 10.854471763126117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hindsight goal relabeling has become a foundational technique for multi-goal
reinforcement learning (RL). The idea is quite simple: any arbitrary trajectory
can be seen as an expert demonstration for reaching the trajectory's end state.
Intuitively, this procedure trains a goal-conditioned policy to imitate a
sub-optimal expert. However, this connection between imitation and hindsight
relabeling is not well understood. Modern imitation learning algorithms are
described in the language of divergence minimization, and yet it remains an
open problem how to recast hindsight goal relabeling into that framework. In
this work, we develop a unified objective for goal-reaching that explains such
a connection, from which we can derive goal-conditioned supervised learning
(GCSL) and the reward function in hindsight experience replay (HER) from first
principles. Experimentally, we find that despite recent advances in
goal-conditioned behaviour cloning (BC), multi-goal Q-learning can still
outperform BC-like methods; moreover, a vanilla combination of both actually
hurts model performance. Under our framework, we study when BC is expected to
help, and empirically validate our findings. Our work further bridges
goal-reaching and generative modeling, illustrating the nuances and new
pathways of extending the success of generative models to RL.
Related papers
- Diffusing States and Matching Scores: A New Framework for Imitation Learning [16.941612670582522]
Adversarial Imitation Learning is traditionally framed as a two-player zero-sum game between a learner and an adversarially chosen cost function.
In recent years, diffusion models have emerged as a non-adversarial alternative to GANs.
We show our approach outperforms GAN-style imitation learning baselines across various continuous control problems.
arXiv Detail & Related papers (2024-10-17T17:59:25Z) - Zero-Shot Offline Imitation Learning via Optimal Transport [21.548195072895517]
Zero-shot imitation learning algorithms reproduce unseen behavior from as little as a single demonstration at test time.
Existing practical approaches view the expert demonstration as a sequence of goals, enabling imitation with a high-level goal selector, and a low-level goal-conditioned policy.
We introduce a novel method that mitigates this issue by directly optimizing the occupancy matching objective that is intrinsic to imitation learning.
arXiv Detail & Related papers (2024-10-11T12:10:51Z) - Understanding, Predicting and Better Resolving Q-Value Divergence in
Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL.
We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training.
For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z) - HIQL: Offline Goal-Conditioned RL with Latent States as Actions [81.67963770528753]
We propose a hierarchical algorithm for goal-conditioned RL from offline data.
We show how this hierarchical decomposition makes our method robust to noise in the estimated value function.
Our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data.
arXiv Detail & Related papers (2023-07-22T00:17:36Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Bisimulation Makes Analogies in Goal-Conditioned Reinforcement Learning [71.52722621691365]
Building generalizable goal-conditioned agents from rich observations is a key to reinforcement learning (RL) solving real world problems.
We propose a new form of state abstraction called goal-conditioned bisimulation.
We learn this representation using a metric form of this abstraction, and show its ability to generalize to new goals in simulation manipulation tasks.
arXiv Detail & Related papers (2022-04-27T17:00:11Z) - Adversarial Intrinsic Motivation for Reinforcement Learning [60.322878138199364]
We investigate whether the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution can be utilized effectively for reinforcement learning tasks.
Our approach, termed Adversarial Intrinsic Motivation (AIM), estimates this Wasserstein-1 distance through its dual objective and uses it to compute a supplemental reward function.
arXiv Detail & Related papers (2021-05-27T17:51:34Z) - CLAMGen: Closed-Loop Arm Motion Generation via Multi-view Vision-Based
RL [4.014524824655106]
We propose a vision-based reinforcement learning (RL) approach for closed-loop trajectory generation in an arm reaching problem.
Arm trajectory generation is a fundamental robotics problem which entails finding collision-free paths to move the robot's body.
arXiv Detail & Related papers (2021-03-24T15:33:03Z) - Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent
Structure Learning [20.506232306308977]
Latent structure models are a powerful tool for modeling language data.
One challenge with end-to-end training of these models is the argmax operation, which has null gradient.
We explore latent structure learning through the angle of pulling back the downstream learning objective.
arXiv Detail & Related papers (2020-10-05T21:56:00Z) - Remembering for the Right Reasons: Explanations Reduce Catastrophic
Forgetting [100.75479161884935]
We propose a novel training paradigm called Remembering for the Right Reasons (RRR)
RRR stores visual model explanations for each example in the buffer and ensures the model has "the right reasons" for its predictions.
We demonstrate how RRR can be easily added to any memory or regularization-based approach and results in reduced forgetting.
arXiv Detail & Related papers (2020-10-04T10:05:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.