PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer
- URL: http://arxiv.org/abs/2406.06793v1
- Date: Mon, 10 Jun 2024 20:59:53 GMT
- Title: PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer
- Authors: Chang Chen, Junyeob Baek, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, Sungjin Ahn,
- Abstract summary: We propose a hierarchical planner designed for offline RL called PlanDQ.
PlanDQ incorporates a diffusion-based planner at the high level, named D-Conductor, which guides the low-level policy through sub-goals.
At the low level, we used a Q-learning based approach called the Q-Performer to accomplish these sub-goals.
- Score: 47.924941959320996
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the recent advancements in offline RL, no unified algorithm could achieve superior performance across a broad range of tasks. Offline \textit{value function learning}, in particular, struggles with sparse-reward, long-horizon tasks due to the difficulty of solving credit assignment and extrapolation errors that accumulates as the horizon of the task grows.~On the other hand, models that can perform well in long-horizon tasks are designed specifically for goal-conditioned tasks, which commonly perform worse than value function learning methods on short-horizon, dense-reward scenarios. To bridge this gap, we propose a hierarchical planner designed for offline RL called PlanDQ. PlanDQ incorporates a diffusion-based planner at the high level, named D-Conductor, which guides the low-level policy through sub-goals. At the low level, we used a Q-learning based approach called the Q-Performer to accomplish these sub-goals. Our experimental results suggest that PlanDQ can achieve superior or competitive performance on D4RL continuous control benchmark tasks as well as AntMaze, Kitchen, and Calvin as long-horizon tasks.
Related papers
- Generalization of Compositional Tasks with Logical Specification via Implicit Planning [14.46490764849977]
We introduce a new hierarchical RL framework that enhances the efficiency and optimality of task generalization.
At the high level, we present an implicit planner specifically designed for generalizing compositional tasks.
It learns a latent transition model and performs planning in the latent space by using a graph neural network (GNN)
arXiv Detail & Related papers (2024-10-13T00:57:10Z) - Diffusion Meets Options: Hierarchical Generative Skill Composition for Temporally-Extended Tasks [12.239868705130178]
We propose a data-driven hierarchical framework that generates and updates plans based on instruction specified by linear temporal logic (LTL)
Our method decomposes temporal tasks into chain of options with hierarchical reinforcement learning from offline non-expert datasets.
We devise a determinantal-guided posterior sampling technique during batch generation, which improves the speed and diversity of diffusion generated options.
arXiv Detail & Related papers (2024-10-03T11:10:37Z) - Spatial Reasoning and Planning for Deep Embodied Agents [2.7195102129095003]
This thesis explores the development of data-driven techniques for spatial reasoning and planning tasks.
It focuses on enhancing learning efficiency, interpretability, and transferability across novel scenarios.
arXiv Detail & Related papers (2024-09-28T23:05:56Z) - In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought [13.034968416139826]
We propose an In-context Decision Transformer (IDT) to achieve self-improvement in a high-level trial-and-error manner.
IDT is inspired by the efficient hierarchical structure of human decision-making.
IDT achieves state-of-the-art in long-horizon tasks over current in-context RL methods.
arXiv Detail & Related papers (2024-05-31T08:38:25Z) - Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks [50.27313829438866]
Plan-Seq-Learn (PSL) is a modular approach that uses motion planning to bridge the gap between abstract language and learned low-level control.
PSL achieves success rates of over 85%, out-performing language-based, classical, and end-to-end approaches.
arXiv Detail & Related papers (2024-05-02T17:59:31Z) - IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive
Control [8.374040635931298]
We introduce an offline model-based RL algorithm, IQL-TD-MPC, that extends the state-of-the-art Temporal Difference Learning for Model Predictive Control (TD-MPC) with Implicit Q-Learning (IQL)
More specifically, we pre-train a temporally abstract IQL-TD-MPC Manager to predict "intent embeddings", which roughly correspond to subgoals, via planning.
We empirically show that augmenting state representations with intent embeddings generated by an IQL-TD-MPC manager significantly improves off-the-shelf offline RL agents
arXiv Detail & Related papers (2023-06-01T16:24:40Z) - Imitating Graph-Based Planning with Goal-Conditioned Policies [72.61631088613048]
We present a self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy.
We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods.
arXiv Detail & Related papers (2023-03-20T14:51:10Z) - Efficient Learning of High Level Plans from Play [57.29562823883257]
We present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL.
We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks.
arXiv Detail & Related papers (2023-03-16T20:09:47Z) - Hierarchical Imitation Learning with Vector Quantized Models [77.67190661002691]
We propose to use reinforcement learning to identify subgoals in expert trajectories.
We build a vector-quantized generative model for the identified subgoals to perform subgoal-level planning.
In experiments, the algorithm excels at solving complex, long-horizon decision-making problems outperforming state-of-the-art.
arXiv Detail & Related papers (2023-01-30T15:04:39Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.