Related papers: Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning

Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning

URL: http://arxiv.org/abs/2306.08388v3
Date: Fri, 12 Jul 2024 01:59:00 GMT
Title: Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning
Authors: Ce Hao, Catherine Weaver, Chen Tang, Kenta Kawamoto, Masayoshi Tomizuka, Wei Zhan,
Abstract summary: We propose the Skill-Critic algorithm to fine-tune the low-level policy in conjunction with high-level skill selection. We validate Skill-Critic in multiple sparse-reward environments, including a new sparse-reward autonomous racing task in Gran Turismo Sport.
Score: 48.75878234995544
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills, i.e. sequences of primitive actions. Typically, a skill latent space and policy are discovered from offline data. However, the resulting low-level policy can be unreliable due to low-coverage demonstrations or distribution shifts. As a solution, we propose the Skill-Critic algorithm to fine-tune the low-level policy in conjunction with high-level skill selection. Our Skill-Critic algorithm optimizes both the low-level and high-level policies; these policies are initialized and regularized by the latent space learned from offline demonstrations to guide the parallel policy optimization. We validate Skill-Critic in multiple sparse-reward RL environments, including a new sparse-reward autonomous racing task in Gran Turismo Sport. The experiments show that Skill-Critic's low-level policy fine-tuning and demonstration-guided regularization are essential for good performance. Code and videos are available at our website: https://sites.google.com/view/skill-critic.

Related papers

Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning [15.902089688167871]
offline goal-conditioned reinforcement learning (GCRL) offers a practical learning paradigm where goal-reaching policies are trained from abundant unlabeled datasets.<n>We propose option-aware Temporally Abstracted value learning, dubbed OTA, which incorporates temporal abstraction into the temporal-difference learning process.<n>We experimentally show that the high-level policy extracted using OTA achieves strong performance on complex tasks from OGBench.
arXiv Detail & Related papers (2025-05-19T05:51:11Z)
Exploiting Hybrid Policy in Reinforcement Learning for Interpretable Temporal Logic Manipulation [12.243491328213217]
Reinforcement Learning (RL) based methods have been increasingly explored for robot learning. We propose a Temporal-Logic-guided Hybrid policy framework (HyTL) which leverages three-level decision layers to improve the agent's performance. We evaluate HyTL on four challenging manipulation tasks, which demonstrate its effectiveness and interpretability.
arXiv Detail & Related papers (2024-12-29T03:34:53Z)
Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone [72.17534881026995]
We develop an offline and online fine-tuning approach called policy-agnostic RL (PA-RL) We show the first result that successfully fine-tunes OpenVLA, a 7B generalist robot policy, autonomously with Cal-QL, an online RL fine-tuning algorithm.
arXiv Detail & Related papers (2024-12-09T17:28:03Z)
Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data. Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment. Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Residual Skill Policies: Learning an Adaptable Skill-based Action Space for Reinforcement Learning for Robotics [18.546688182454236]
Skill-based reinforcement learning (RL) has emerged as a promising strategy to leverage prior knowledge for accelerated robot learning. We propose accelerating exploration in the skill space using state-conditioned generative models. We validate our approach across four challenging manipulation tasks, demonstrating our ability to learn across task variations.
arXiv Detail & Related papers (2022-11-04T02:42:17Z)
Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy. In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks. We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z)
Goal-Conditioned Reinforcement Learning with Imagined Subgoals [89.67840168694259]
We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks. Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic. We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
arXiv Detail & Related papers (2021-07-01T15:30:59Z)
Simplifying Deep Reinforcement Learning via Self-Supervision [51.2400839966489]
Self-Supervised Reinforcement Learning (SSRL) is a simple algorithm that optimize policies with purely supervised losses. We show that SSRL is surprisingly competitive to contemporary algorithms with more stable performance and less running time.
arXiv Detail & Related papers (2021-06-10T06:29:59Z)
Continuous Action Reinforcement Learning from a Mixture of Interpretable Experts [35.80418547105711]
We propose a policy scheme that retains a complex function approxor for its internal value predictions but constrains the policy to have a concise, hierarchical, and human-readable structure. The main technical contribution of the paper is to address the challenges introduced by this non-differentiable state selection procedure.
arXiv Detail & Related papers (2020-06-10T16:02:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.