Rethinking Goal-conditioned Supervised Learning and Its Connection to
Offline RL
- URL: http://arxiv.org/abs/2202.04478v1
- Date: Wed, 9 Feb 2022 14:17:05 GMT
- Title: Rethinking Goal-conditioned Supervised Learning and Its Connection to
Offline RL
- Authors: Rui Yang, Yiming Lu, Wenzhe Li, Hao Sun, Meng Fang, Yali Du, Xiu Li,
Lei Han, Chongjie Zhang
- Abstract summary: Goal-Conditioned Supervised Learning (GCSL) provides a new learning framework by iteratively relabeling and imitating self-generated experiences.
We extend GCSL as a novel offline goal-conditioned RL algorithm.
We show that WGCSL can consistently outperform GCSL and existing state-of-the-art offline methods.
- Score: 49.26825108780872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Solving goal-conditioned tasks with sparse rewards using self-supervised
learning is promising because of its simplicity and stability over current
reinforcement learning (RL) algorithms. A recent work, called Goal-Conditioned
Supervised Learning (GCSL), provides a new learning framework by iteratively
relabeling and imitating self-generated experiences. In this paper, we revisit
the theoretical property of GCSL -- optimizing a lower bound of the goal
reaching objective, and extend GCSL as a novel offline goal-conditioned RL
algorithm. The proposed method is named Weighted GCSL (WGCSL), in which we
introduce an advanced compound weight consisting of three parts (1) discounted
weight for goal relabeling, (2) goal-conditioned exponential advantage weight,
and (3) best-advantage weight. Theoretically, WGCSL is proved to optimize an
equivalent lower bound of the goal-conditioned RL objective and generates
monotonically improved policies via an iterated scheme. The monotonic property
holds for any behavior policies, and therefore WGCSL can be applied to both
online and offline settings. To evaluate algorithms in the offline
goal-conditioned RL setting, we provide a benchmark including a range of point
and simulated robot domains. Experiments in the introduced benchmark
demonstrate that WGCSL can consistently outperform GCSL and existing
state-of-the-art offline methods in the fully offline goal-conditioned setting.
Related papers
- OGBench: Benchmarking Offline Goal-Conditioned RL [72.00291801676684]
offline goal-conditioned reinforcement learning (GCRL) is a major problem in reinforcement learning.
We propose OGBench, a new, high-quality benchmark for algorithms research in offline goal-conditioned RL.
arXiv Detail & Related papers (2024-10-26T06:06:08Z) - Q-WSL: Optimizing Goal-Conditioned RL with Weighted Supervised Learning via Dynamic Programming [22.359171999254706]
A novel class of advanced algorithms, termed GoalConditioned Weighted Supervised Learning (GCWSL), has recently emerged to tackle the challenges posed by sparse rewards in goal-conditioned reinforcement learning (RL)
GCWSL consistently delivers strong performance across a diverse set of goal-reaching tasks due to its simplicity, effectiveness, and stability.
However, GCWSL methods lack a crucial capability known as trajectory stitching, which is essential for learning optimal policies when faced with unseen skills during testing.
In this paper, we propose Q-learning Weighted Supervised Learning (Q-WSL), a novel framework designed to overcome the limitations of GC
arXiv Detail & Related papers (2024-10-09T08:00:12Z) - Offline Goal-Conditioned Reinforcement Learning for Safety-Critical
Tasks with Recovery Policy [4.854443247023496]
offline goal-conditioned reinforcement learning (GCRL) aims at solving goal-reaching tasks with sparse rewards from an offline dataset.
We propose a new method called Recovery-based Supervised Learning (RbSL) to accomplish safety-critical tasks with various goals.
arXiv Detail & Related papers (2024-03-04T05:20:57Z) - SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning [33.125187822259186]
Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve multiple goals in an environment purely from offline datasets using sparse reward functions.
We present a novel approach to GCRL under a new lens of mixture-distribution matching, leading to our discriminator-free method: SMORe.
arXiv Detail & Related papers (2023-11-03T16:19:33Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Goal-Conditioned Supervised Learning with Sub-Goal Prediction [24.172457177786523]
We propose Trajectory Iterative Learner (TraIL) to tackle goal-conditioned reinforcement-learning.
TraIL further exploits the information in a trajectory, and uses it for learning to predict both actions and sub-goals.
For several popular problem settings, replacing real goals with predicted TraIL sub-goals allows the agent to reach a greater set of goal states.
arXiv Detail & Related papers (2023-05-17T12:54:58Z) - Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning [73.80728148866906]
Quasimetric Reinforcement Learning (QRL) is a new RL method that utilizes quasimetric models to learn optimal value functions.
On offline and online goal-reaching benchmarks, QRL also demonstrates improved sample efficiency and performance.
arXiv Detail & Related papers (2023-04-03T17:59:58Z) - Provably Efficient Offline Goal-Conditioned Reinforcement Learning with
General Function Approximation and Single-Policy Concentrability [11.786486763236104]
Goal-conditioned reinforcement learning (GCRL) refers to learning general-purpose skills that aim to reach diverse goals.
offline GCRL only requires purely pre-collected datasets to perform training tasks.
We show that a modified offline GCRL algorithm is both provably efficient with general function approximation and single-policy concentrability.
arXiv Detail & Related papers (2023-02-07T22:04:55Z) - LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement
Learning [78.2286146954051]
LCRL implements model-free Reinforcement Learning (RL) algorithms over unknown Decision Processes (MDPs)
We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL.
arXiv Detail & Related papers (2022-09-21T13:21:00Z) - Metric Residual Networks for Sample Efficient Goal-conditioned
Reinforcement Learning [52.59242013527014]
Goal-conditioned reinforcement learning (GCRL) has a wide range of potential real-world applications.
Sample efficiency is of utmost importance for GCRL since, by default, the agent is only rewarded when it reaches its goal.
We introduce a novel neural architecture for GCRL that achieves significantly better sample efficiency than the commonly-used monolithic network architecture.
arXiv Detail & Related papers (2022-08-17T08:04:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.