Goal-Conditioned Supervised Learning with Sub-Goal Prediction
- URL: http://arxiv.org/abs/2305.10171v1
- Date: Wed, 17 May 2023 12:54:58 GMT
- Title: Goal-Conditioned Supervised Learning with Sub-Goal Prediction
- Authors: Tom Jurgenson and Aviv Tamar
- Abstract summary: We propose Trajectory Iterative Learner (TraIL) to tackle goal-conditioned reinforcement-learning.
TraIL further exploits the information in a trajectory, and uses it for learning to predict both actions and sub-goals.
For several popular problem settings, replacing real goals with predicted TraIL sub-goals allows the agent to reach a greater set of goal states.
- Score: 24.172457177786523
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, a simple yet effective algorithm -- goal-conditioned
supervised-learning (GCSL) -- was proposed to tackle goal-conditioned
reinforcement-learning. GCSL is based on the principle of hindsight learning:
by observing states visited in previously executed trajectories and treating
them as attained goals, GCSL learns the corresponding actions via supervised
learning. However, GCSL only learns a goal-conditioned policy, discarding other
information in the process. Our insight is that the same hindsight principle
can be used to learn to predict goal-conditioned sub-goals from the same
trajectory. Based on this idea, we propose Trajectory Iterative Learner
(TraIL), an extension of GCSL that further exploits the information in a
trajectory, and uses it for learning to predict both actions and sub-goals. We
investigate the settings in which TraIL can make better use of the data, and
discover that for several popular problem settings, replacing real goals in
GCSL with predicted TraIL sub-goals allows the agent to reach a greater set of
goal states using the exact same data as GCSL, thereby improving its overall
performance.
Related papers
- MGDA: Model-based Goal Data Augmentation for Offline Goal-conditioned Weighted Supervised Learning [23.422157931057498]
State-of-the-art algorithms, known as Goal-Conditioned Weighted Supervised Learning (GCWSL) methods, have been introduced to tackle challenges in offline goal-conditioned reinforcement learning (RL)
GCWSL has demonstrated outstanding performance across diverse goal-reaching tasks, providing a simple, effective, and stable solution.
However, prior research has identified a critical limitation of GCWSL: the lack of trajectory stitching capabilities.
We propose a Model-based Goal Data Augmentation (MGDA) approach, which leverages a learned dynamics model to sample more suitable augmented goals.
arXiv Detail & Related papers (2024-12-16T03:25:28Z) - Fusion Self-supervised Learning for Recommendation [16.02820746003461]
We propose a Fusion Self-supervised Learning framework for recommendation.
Specifically, we use high-order information from GCN process to create contrastive views.
To integrate self-supervised signals from various CL objectives, we propose an advanced CL objective.
arXiv Detail & Related papers (2024-07-29T04:30:38Z) - SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning [33.125187822259186]
Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve multiple goals in an environment purely from offline datasets using sparse reward functions.
We present a novel approach to GCRL under a new lens of mixture-distribution matching, leading to our discriminator-free method: SMORe.
arXiv Detail & Related papers (2023-11-03T16:19:33Z) - HIQL: Offline Goal-Conditioned RL with Latent States as Actions [81.67963770528753]
We propose a hierarchical algorithm for goal-conditioned RL from offline data.
We show how this hierarchical decomposition makes our method robust to noise in the estimated value function.
Our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data.
arXiv Detail & Related papers (2023-07-22T00:17:36Z) - Understanding and Improving the Role of Projection Head in
Self-Supervised Learning [77.59320917894043]
Self-supervised learning (SSL) aims to produce useful feature representations without access to human-labeled data annotations.
Current contrastive learning approaches append a parametrized projection head to the end of some backbone network to optimize the InfoNCE objective.
This raises a fundamental question: Why is a learnable projection head required if we are to discard it after training?
arXiv Detail & Related papers (2022-12-22T05:42:54Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Rethinking Goal-conditioned Supervised Learning and Its Connection to
Offline RL [49.26825108780872]
Goal-Conditioned Supervised Learning (GCSL) provides a new learning framework by iteratively relabeling and imitating self-generated experiences.
We extend GCSL as a novel offline goal-conditioned RL algorithm.
We show that WGCSL can consistently outperform GCSL and existing state-of-the-art offline methods.
arXiv Detail & Related papers (2022-02-09T14:17:05Z) - C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation.
We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states.
E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z) - Aligning Pretraining for Detection via Object-Level Contrastive Learning [57.845286545603415]
Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning.
We argue that this could be sub-optimal and thus advocate a design principle which encourages alignment between the self-supervised pretext task and the downstream task.
Our method, called Selective Object COntrastive learning (SoCo), achieves state-of-the-art results for transfer performance on COCO detection.
arXiv Detail & Related papers (2021-06-04T17:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.