Broadly-Exploring, Local-Policy Trees for Long-Horizon Task Planning
- URL: http://arxiv.org/abs/2010.06491v1
- Date: Tue, 13 Oct 2020 15:51:24 GMT
- Title: Broadly-Exploring, Local-Policy Trees for Long-Horizon Task Planning
- Authors: Brian Ichter, Pierre Sermanet, Corey Lynch
- Abstract summary: Long-horizon planning in realistic environments requires the ability to reason over sequential tasks in high-dimensional state spaces.
We present Broadly-Exploring-Local-policy Trees (BELT), a task-conditioned, model-based tree search.
BELT is demonstrated experimentally to be able to plan long-horizon, sequential with a goal conditioned policy and generate plans that are robust.
- Score: 12.024736761925864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Long-horizon planning in realistic environments requires the ability to
reason over sequential tasks in high-dimensional state spaces with complex
dynamics. Classical motion planning algorithms, such as rapidly-exploring
random trees, are capable of efficiently exploring large state spaces and
computing long-horizon, sequential plans. However, these algorithms are
generally challenged with complex, stochastic, and high-dimensional state
spaces as well as in the presence of narrow passages, which naturally emerge in
tasks that interact with the environment. Machine learning offers a promising
solution for its ability to learn general policies that can handle complex
interactions and high-dimensional observations. However, these policies are
generally limited in horizon length. Our approach, Broadly-Exploring,
Local-policy Trees (BELT), merges these two approaches to leverage the
strengths of both through a task-conditioned, model-based tree search. BELT
uses an RRT-inspired tree search to efficiently explore the state space.
Locally, the exploration is guided by a task-conditioned, learned policy
capable of performing general short-horizon tasks. This task space can be quite
general and abstract; its only requirements are to be sampleable and to
well-cover the space of useful tasks. This search is aided by a
task-conditioned model that temporally extends dynamics propagation to allow
long-horizon search and sequential reasoning over tasks. BELT is demonstrated
experimentally to be able to plan long-horizon, sequential trajectories with a
goal conditioned policy and generate plans that are robust.
Related papers
- Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization [18.25487451605638]
We derive a tree search algorithm based on policy optimization with state occupancy measure regularization, which we call it Volume-MCTS
We show that count-based exploration and sampling-based motion planning can be derived as approximate solutions to this state occupancy measure regularized objective.
We test our method on several robot navigation problems, and find that Volume-MCTS outperforms AlphaZero and displays significantly better long-horizon exploration properties.
arXiv Detail & Related papers (2024-07-07T22:58:52Z) - Generalizable Long-Horizon Manipulations with Large Language Models [91.740084601715]
This work introduces a framework harnessing the capabilities of Large Language Models (LLMs) to generate primitive task conditions for generalizable long-horizon manipulations.
We create a challenging robotic manipulation task suite based on Pybullet for long-horizon task evaluation.
arXiv Detail & Related papers (2023-10-03T17:59:46Z) - AI planning in the imagination: High-level planning on learned abstract
search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training.
We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z) - Efficient Learning of High Level Plans from Play [57.29562823883257]
We present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL.
We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks.
arXiv Detail & Related papers (2023-03-16T20:09:47Z) - Long-HOT: A Modular Hierarchical Approach for Long-Horizon Object
Transport [83.06265788137443]
We address key challenges in long-horizon embodied exploration and navigation by proposing a new object transport task and a novel modular framework for temporally extended navigation.
Our first contribution is the design of a novel Long-HOT environment focused on deep exploration and long-horizon planning.
We propose a modular hierarchical transport policy (HTP) that builds a topological graph of the scene to perform exploration with the help of weighted frontiers.
arXiv Detail & Related papers (2022-10-28T05:30:49Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - Overcoming Exploration: Deep Reinforcement Learning in Complex
Environments from Temporal Logic Specifications [2.8904578737516764]
We present a Deep Reinforcement Learning (DRL) algorithm for a task-guided robot with unknown continuous-time dynamics deployed in a large-scale complex environment.
Our framework is shown to significantly improve performance (effectiveness, efficiency) and exploration of robots tasked with complex missions in large-scale complex environments.
arXiv Detail & Related papers (2022-01-28T16:39:08Z) - Successor Feature Landmarks for Long-Horizon Goal-Conditioned
Reinforcement Learning [54.378444600773875]
We introduce Successor Feature Landmarks (SFL), a framework for exploring large, high-dimensional environments.
SFL drives exploration by estimating state-novelty and enables high-level planning by abstracting the state-space as a non-parametric landmark-based graph.
We show in our experiments on MiniGrid and ViZDoom that SFL enables efficient exploration of large, high-dimensional state spaces.
arXiv Detail & Related papers (2021-11-18T18:36:05Z) - LS3: Latent Space Safe Sets for Long-Horizon Visuomotor Control of
Iterative Tasks [28.287631944795823]
Reinforcement learning algorithms have shown impressive success in exploring high-dimensional environments to learn complex, long-horizon tasks.
A promising strategy for safe learning in dynamically uncertain environments is requiring that the agent can robustly return to states where task success can be guaranteed.
We present Latent Space Safe Sets (LS3), which extends this strategy to iterative, long-horizon tasks with image observations.
arXiv Detail & Related papers (2021-07-10T06:46:10Z) - Flexible and Efficient Long-Range Planning Through Curious Exploration [13.260508939271764]
We show that the Curious Sample Planner can efficiently discover temporally-extended plans for solving a wide range of physically realistic 3D tasks.
In contrast, standard planning and learning methods often fail to solve these tasks at all or do so only with a huge and highly variable number of training samples.
arXiv Detail & Related papers (2020-04-22T21:47:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.