Flexible and Efficient Long-Range Planning Through Curious Exploration
- URL: http://arxiv.org/abs/2004.10876v2
- Date: Wed, 8 Jul 2020 06:32:15 GMT
- Title: Flexible and Efficient Long-Range Planning Through Curious Exploration
- Authors: Aidan Curtis, Minjian Xin, Dilip Arumugam, Kevin Feigelis, Daniel
Yamins
- Abstract summary: We show that the Curious Sample Planner can efficiently discover temporally-extended plans for solving a wide range of physically realistic 3D tasks.
In contrast, standard planning and learning methods often fail to solve these tasks at all or do so only with a huge and highly variable number of training samples.
- Score: 13.260508939271764
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Identifying algorithms that flexibly and efficiently discover
temporally-extended multi-phase plans is an essential step for the advancement
of robotics and model-based reinforcement learning. The core problem of
long-range planning is finding an efficient way to search through the tree of
possible action sequences. Existing non-learned planning solutions from the
Task and Motion Planning (TAMP) literature rely on the existence of logical
descriptions for the effects and preconditions for actions. This constraint
allows TAMP methods to efficiently reduce the tree search problem but limits
their ability to generalize to unseen and complex physical environments. In
contrast, deep reinforcement learning (DRL) methods use flexible
neural-network-based function approximators to discover policies that
generalize naturally to unseen circumstances. However, DRL methods struggle to
handle the very sparse reward landscapes inherent to long-range multi-step
planning situations. Here, we propose the Curious Sample Planner (CSP), which
fuses elements of TAMP and DRL by combining a curiosity-guided sampling
strategy with imitation learning to accelerate planning. We show that CSP can
efficiently discover interesting and complex temporally-extended plans for
solving a wide range of physically realistic 3D tasks. In contrast, standard
planning and learning methods often fail to solve these tasks at all or do so
only with a huge and highly variable number of training samples. We explore the
use of a variety of curiosity metrics with CSP and analyze the types of
solutions that CSP discovers. Finally, we show that CSP supports task transfer
so that the exploration policies learned during experience with one task can
help improve efficiency on related tasks.
Related papers
- Action abstractions for amortized sampling [49.384037138511246]
We propose an approach to incorporate the discovery of action abstractions, or high-level actions, into the policy optimization process.
Our approach involves iteratively extracting action subsequences commonly used across many high-reward trajectories and chunking' them into a single action that is added to the action space.
arXiv Detail & Related papers (2024-10-19T19:22:50Z) - Sample-Efficient Reinforcement Learning with Temporal Logic Objectives: Leveraging the Task Specification to Guide Exploration [13.053013407015628]
This paper addresses the problem of learning optimal control policies for systems with uncertain dynamics.
We propose an accelerated RL algorithm that can learn control policies significantly faster than competitive approaches.
arXiv Detail & Related papers (2024-10-16T00:53:41Z) - Spatial Reasoning and Planning for Deep Embodied Agents [2.7195102129095003]
This thesis explores the development of data-driven techniques for spatial reasoning and planning tasks.
It focuses on enhancing learning efficiency, interpretability, and transferability across novel scenarios.
arXiv Detail & Related papers (2024-09-28T23:05:56Z) - Learning Logic Specifications for Policy Guidance in POMDPs: an
Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver.
We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications.
We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z) - AI planning in the imagination: High-level planning on learned abstract
search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training.
We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z) - Multi-Robot Path Planning Combining Heuristics and Multi-Agent
Reinforcement Learning [0.0]
In the movement process, robots need to avoid collisions with other moving robots while minimizing their travel distance.
Previous methods for this problem either continuously replan paths using search methods to avoid conflicts or choose appropriate collision avoidance strategies based on learning approaches.
We propose a path planning method, MAPPOHR, which combines a search, empirical rules, and multi-agent reinforcement learning.
arXiv Detail & Related papers (2023-06-02T05:07:37Z) - Efficient Learning of High Level Plans from Play [57.29562823883257]
We present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL.
We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks.
arXiv Detail & Related papers (2023-03-16T20:09:47Z) - Exploration via Planning for Information about the Optimal Trajectory [67.33886176127578]
We develop a method that allows us to plan for exploration while taking the task and the current knowledge into account.
We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines.
arXiv Detail & Related papers (2022-10-06T20:28:55Z) - Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks.
Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z) - Representation, learning, and planning algorithms for geometric task and
motion planning [24.862289058632186]
We present a framework for learning to guide geometric task and motion planning (GTAMP)
GTAMP is a subclass of task and motion planning in which the goal is to move multiple objects to target regions among movable obstacles.
A standard graph search algorithm is not directly applicable, because GTAMP problems involve hybrid search spaces and expensive action feasibility checks.
arXiv Detail & Related papers (2022-03-09T09:47:01Z) - Adaptive Informative Path Planning Using Deep Reinforcement Learning for
UAV-based Active Sensing [2.6519061087638014]
We propose a new approach for informative path planning based on deep reinforcement learning (RL)
Our method combines Monte Carlo tree search with an offline-learned neural network predicting informative sensing actions.
By deploying the trained network during a mission, our method enables sample-efficient online replanning on physical platforms with limited computational resources.
arXiv Detail & Related papers (2021-09-28T09:00:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.