Goal-Conditioned Reinforcement Learning with Disentanglement-based
Reachability Planning
- URL: http://arxiv.org/abs/2307.10846v1
- Date: Thu, 20 Jul 2023 13:08:14 GMT
- Title: Goal-Conditioned Reinforcement Learning with Disentanglement-based
Reachability Planning
- Authors: Zhifeng Qian and Mingyu You and Hongjun Zhou and Xuanhui Xu and Bin He
- Abstract summary: We propose a goal-conditioned RL algorithm combined with Disentanglement-based Reachability Planning (REPlan) to solve temporally extended tasks.
Our REPlan significantly outperforms the prior state-of-the-art methods in solving temporally extended tasks.
- Score: 14.370384505230597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Goal-Conditioned Reinforcement Learning (GCRL) can enable agents to
spontaneously set diverse goals to learn a set of skills. Despite the excellent
works proposed in various fields, reaching distant goals in temporally extended
tasks remains a challenge for GCRL. Current works tackled this problem by
leveraging planning algorithms to plan intermediate subgoals to augment GCRL.
Their methods need two crucial requirements: (i) a state representation space
to search valid subgoals, and (ii) a distance function to measure the
reachability of subgoals. However, they struggle to scale to high-dimensional
state space due to their non-compact representations. Moreover, they cannot
collect high-quality training data through standard GC policies, which results
in an inaccurate distance function. Both affect the efficiency and performance
of planning and policy learning. In the paper, we propose a goal-conditioned RL
algorithm combined with Disentanglement-based Reachability Planning (REPlan) to
solve temporally extended tasks. In REPlan, a Disentangled Representation
Module (DRM) is proposed to learn compact representations which disentangle
robot poses and object positions from high-dimensional observations in a
self-supervised manner. A simple REachability discrimination Module (REM) is
also designed to determine the temporal distance of subgoals. Moreover, REM
computes intrinsic bonuses to encourage the collection of novel states for
training. We evaluate our REPlan in three vision-based simulation tasks and one
real-world task. The experiments demonstrate that our REPlan significantly
outperforms the prior state-of-the-art methods in solving temporally extended
tasks.
Related papers
- Probabilistic Subgoal Representations for Hierarchical Reinforcement learning [16.756888009396462]
In goal-conditioned hierarchical reinforcement learning, a high-level policy specifies a subgoal for the low-level policy to reach.
Existing methods adopt a subgoal representation that provides a deterministic mapping from state space to latent subgoal space.
This paper employs a GP prior on the latent subgoal space to learn a posterior distribution over the subgoal representation functions.
arXiv Detail & Related papers (2024-06-24T15:09:22Z) - Imitating Graph-Based Planning with Goal-Conditioned Policies [72.61631088613048]
We present a self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy.
We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods.
arXiv Detail & Related papers (2023-03-20T14:51:10Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation.
We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states.
E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z) - Adversarial Intrinsic Motivation for Reinforcement Learning [60.322878138199364]
We investigate whether the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution can be utilized effectively for reinforcement learning tasks.
Our approach, termed Adversarial Intrinsic Motivation (AIM), estimates this Wasserstein-1 distance through its dual objective and uses it to compute a supplemental reward function.
arXiv Detail & Related papers (2021-05-27T17:51:34Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.