Related papers: Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space

Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space

URL: http://arxiv.org/abs/2205.08129v2
Date: Tue, 18 Apr 2023 07:06:50 GMT
Title: Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space
Authors: Kuan Fang, Patrick Yin, Ashvin Nair, Sergey Levine
Abstract summary: General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments. To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command. We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
Score: 76.46113138484947
License: http://creativecommons.org/licenses/by/4.0/
Abstract: General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments. To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach configurable goals for a wide range of tasks on command. However, such goal-conditioned policies are notoriously difficult and time-consuming to train from scratch. In this paper, we propose Planning to Practice (PTP), a method that makes it practical to train goal-conditioned policies for long-horizon tasks that require multiple distinct types of interactions to solve. Our approach is based on two key ideas. First, we decompose the goal-reaching problem hierarchically, with a high-level planner that sets intermediate subgoals using conditional subgoal generators in the latent space for a low-level model-free policy. Second, we propose a hybrid approach which first pre-trains both the conditional subgoal generator and the policy on previously collected data through offline reinforcement learning, and then fine-tunes the policy via online exploration. This fine-tuning process is itself facilitated by the planned subgoals, which breaks down the original target task into short-horizon goal-reaching tasks that are significantly easier to learn. We conduct experiments in both the simulation and real world, in which the policy is pre-trained on demonstrations of short primitive behaviors and fine-tuned for temporally extended tasks that are unseen in the offline data. Our experimental results show that PTP can generate feasible sequences of subgoals that enable the policy to efficiently solve the target tasks.

Related papers

Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data. Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment. Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z)
GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models [31.628341050846768]
Goal-conditioned Offline Planning (GOPlan) is a novel model-based framework that contains two key phases. GOPlan pretrains a prior policy capable of capturing multi-modal action distribution within the multi-goal dataset. The reanalysis method generates high-quality imaginary data by planning with learned models for both intra-trajectory and inter-trajectory goals.
arXiv Detail & Related papers (2023-10-30T21:19:52Z)
Comparing the Efficacy of Fine-Tuning and Meta-Learning for Few-Shot Policy Imitation [45.312333134810665]
State-of-the-art methods to tackle few-shot imitation rely on meta-learning. Recent work has shown that fine-tuners outperform meta-learners in few-shot image classification tasks. We release an open source dataset called iMuJoCo consisting of 154 variants of popular OpenAI-Gym MuJoCo environments.
arXiv Detail & Related papers (2023-06-23T15:29:15Z)
Imitating Graph-Based Planning with Goal-Conditioned Policies [72.61631088613048]
We present a self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy. We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods.
arXiv Detail & Related papers (2023-03-20T14:51:10Z)
C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation. We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states. E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z)
Goal-Conditioned Reinforcement Learning with Imagined Subgoals [89.67840168694259]
We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks. Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic. We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
arXiv Detail & Related papers (2021-07-01T15:30:59Z)
Towards Exploiting Geometry and Time for FastOff-Distribution Adaptation in Multi-Task RobotLearning [17.903462188570067]
We train policies for a base set of pre-training tasks, then experiment with adapting to new off-distribution tasks. We find that combining low-complexity target policy classes, base policies as black-box priors, and simple optimization algorithms allows us to acquire new tasks outside the base task distribution.
arXiv Detail & Related papers (2021-06-24T02:13:50Z)
Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors. In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency. We propose setting up an automatic curriculum for goals that the agent needs to solve. We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.