Related papers: Goal-Conditioned Supervised Learning with Sub-Goal Prediction

Goal-Conditioned Supervised Learning with Sub-Goal Prediction

URL: http://arxiv.org/abs/2305.10171v1
Date: Wed, 17 May 2023 12:54:58 GMT
Title: Goal-Conditioned Supervised Learning with Sub-Goal Prediction
Authors: Tom Jurgenson and Aviv Tamar
Abstract summary: We propose Trajectory Iterative Learner (TraIL) to tackle goal-conditioned reinforcement-learning. TraIL further exploits the information in a trajectory, and uses it for learning to predict both actions and sub-goals. For several popular problem settings, replacing real goals with predicted TraIL sub-goals allows the agent to reach a greater set of goal states.
Score: 24.172457177786523
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, a simple yet effective algorithm -- goal-conditioned supervised-learning (GCSL) -- was proposed to tackle goal-conditioned reinforcement-learning. GCSL is based on the principle of hindsight learning: by observing states visited in previously executed trajectories and treating them as attained goals, GCSL learns the corresponding actions via supervised learning. However, GCSL only learns a goal-conditioned policy, discarding other information in the process. Our insight is that the same hindsight principle can be used to learn to predict goal-conditioned sub-goals from the same trajectory. Based on this idea, we propose Trajectory Iterative Learner (TraIL), an extension of GCSL that further exploits the information in a trajectory, and uses it for learning to predict both actions and sub-goals. We investigate the settings in which TraIL can make better use of the data, and discover that for several popular problem settings, replacing real goals in GCSL with predicted TraIL sub-goals allows the agent to reach a greater set of goal states using the exact same data as GCSL, thereby improving its overall performance.

Related papers

Diffusion Guidance Is a Controllable Policy Improvement Operator [98.11511661904618]
CFGRL is trained with the simplicity of supervised learning, yet can further improve on the policies in the data.<n>On offline RL tasks, we observe a reliable trend -- increased guidance weighting leads to increased performance.
arXiv Detail & Related papers (2025-05-29T14:06:50Z)
MGDA: Model-based Goal Data Augmentation for Offline Goal-conditioned Weighted Supervised Learning [23.422157931057498]
State-of-the-art algorithms, known as Goal-Conditioned Weighted Supervised Learning (GCWSL) methods, have been introduced to tackle challenges in offline goal-conditioned reinforcement learning (RL) GCWSL has demonstrated outstanding performance across diverse goal-reaching tasks, providing a simple, effective, and stable solution. However, prior research has identified a critical limitation of GCWSL: the lack of trajectory stitching capabilities. We propose a Model-based Goal Data Augmentation (MGDA) approach, which leverages a learned dynamics model to sample more suitable augmented goals.
arXiv Detail & Related papers (2024-12-16T03:25:28Z)
Goal-Conditioned Supervised Learning for Multi-Objective Recommendation [8.593384839118658]
Multi-objective learning endeavors to concurrently optimize multiple objectives using a single model.<n>This paper introduces a Multi-Objective Goal-Conditioned Supervised Learning framework for automatically learning to achieve multiple objectives from offline sequential data.
arXiv Detail & Related papers (2024-12-12T03:47:40Z)
Fusion Self-supervised Learning for Recommendation [16.02820746003461]
We propose a Fusion Self-supervised Learning framework for recommendation. Specifically, we use high-order information from GCN process to create contrastive views. To integrate self-supervised signals from various CL objectives, we propose an advanced CL objective.
arXiv Detail & Related papers (2024-07-29T04:30:38Z)
SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning [33.125187822259186]
Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve multiple goals in an environment purely from offline datasets using sparse reward functions. We present a novel approach to GCRL under a new lens of mixture-distribution matching, leading to our discriminator-free method: SMORe.
arXiv Detail & Related papers (2023-11-03T16:19:33Z)
HIQL: Offline Goal-Conditioned RL with Latent States as Actions [81.67963770528753]
We propose a hierarchical algorithm for goal-conditioned RL from offline data. We show how this hierarchical decomposition makes our method robust to noise in the estimated value function. Our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data.
arXiv Detail & Related papers (2023-07-22T00:17:36Z)
Understanding and Improving the Role of Projection Head in Self-Supervised Learning [77.59320917894043]
Self-supervised learning (SSL) aims to produce useful feature representations without access to human-labeled data annotations. Current contrastive learning approaches append a parametrized projection head to the end of some backbone network to optimize the InfoNCE objective. This raises a fundamental question: Why is a learnable projection head required if we are to discard it after training?
arXiv Detail & Related papers (2022-12-22T05:42:54Z)
Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups. We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z)
Goal Exploration Augmentation via Pre-trained Skills for Sparse-Reward Long-Horizon Goal-Conditioned Reinforcement Learning [6.540225358657128]
Reinforcement learning (RL) often struggles to accomplish a sparse-reward long-horizon task in a complex environment. Goal-conditioned reinforcement learning (GCRL) has been employed to tackle this difficult problem via a curriculum of easy-to-reach sub-goals. In GCRL, exploring novel sub-goals is essential for the agent to ultimately find the pathway to the desired goal.
arXiv Detail & Related papers (2022-10-28T11:11:04Z)
TarGF: Learning Target Gradient Field for Object Rearrangement [8.49306925839127]
We focus on a more practical setting in object rearrangement, i.e., rearranging objects from shuffled layouts to a normative target distribution. It is hard to describe the target distribution (goal specification) for reward engineering or collect expert trajectories as demonstrations. We employ the score-matching objective to train a Target Gradient Field (TarGF), indicating a direction on each object to increase the likelihood of the target distribution.
arXiv Detail & Related papers (2022-09-02T07:20:34Z)
Rethinking Goal-conditioned Supervised Learning and Its Connection to Offline RL [49.26825108780872]
Goal-Conditioned Supervised Learning (GCSL) provides a new learning framework by iteratively relabeling and imitating self-generated experiences. We extend GCSL as a novel offline goal-conditioned RL algorithm. We show that WGCSL can consistently outperform GCSL and existing state-of-the-art offline methods.
arXiv Detail & Related papers (2022-02-09T14:17:05Z)
C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation. We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states. E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z)
Aligning Pretraining for Detection via Object-Level Contrastive Learning [57.845286545603415]
Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning. We argue that this could be sub-optimal and thus advocate a design principle which encourages alignment between the self-supervised pretext task and the downstream task. Our method, called Selective Object COntrastive learning (SoCo), achieves state-of-the-art results for transfer performance on COCO detection.
arXiv Detail & Related papers (2021-06-04T17:59:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.