AI planning in the imagination: High-level planning on learned abstract
search spaces
- URL: http://arxiv.org/abs/2308.08693v2
- Date: Sun, 3 Dec 2023 04:08:15 GMT
- Title: AI planning in the imagination: High-level planning on learned abstract
search spaces
- Authors: Carlos Martin, Tuomas Sandholm
- Abstract summary: We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training.
We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
- Score: 68.75684174531962
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Search and planning algorithms have been a cornerstone of artificial
intelligence since the field's inception. Giving reinforcement learning agents
the ability to plan during execution time has resulted in significant
performance improvements in various domains. However, in real-world
environments, the model with respect to which the agent plans has been
constrained to be grounded in the real environment itself, as opposed to a more
abstract model which allows for planning over compound actions and behaviors.
We propose a new method, called PiZero, that gives an agent the ability to plan
in an abstract search space that the agent learns during training, which is
completely decoupled from the real environment. Unlike prior approaches, this
enables the agent to perform high-level planning at arbitrary timescales and
reason in terms of compound or temporally-extended actions, which can be useful
in environments where large numbers of base-level micro-actions are needed to
perform relevant macro-actions. In addition, our method is more general than
comparable prior methods because it seamlessly handles settings with continuous
action spaces, combinatorial action spaces, and partial observability. We
evaluate our method on multiple domains, including the traveling salesman
problem, Sokoban, 2048, the facility location problem, and Pacman.
Experimentally, it outperforms comparable prior methods without assuming access
to an environment simulator at execution time.
Related papers
- ReasonPlanner: Enhancing Autonomous Planning in Dynamic Environments with Temporal Knowledge Graphs and LLMs [0.32141666878560626]
We introduce ReasonPlanner, a novel generalist agent designed for reflective thinking, planning, and interactive reasoning.
ReasonPlanner significantly outperforms previous state-of-the-art prompting-based methods on the ScienceWorld benchmark by more than 1.8 times.
It relies solely on frozen weights thus requiring no gradient updates.
arXiv Detail & Related papers (2024-10-11T20:58:51Z) - Embodied Instruction Following in Unknown Environments [66.60163202450954]
We propose an embodied instruction following (EIF) method for complex tasks in the unknown environment.
We build a hierarchical embodied instruction following framework including the high-level task planner and the low-level exploration controller.
For the task planner, we generate the feasible step-by-step plans for human goal accomplishment according to the task completion process and the known visual clues.
arXiv Detail & Related papers (2024-06-17T17:55:40Z) - Latent Exploration for Reinforcement Learning [87.42776741119653]
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment.
We propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network.
arXiv Detail & Related papers (2023-05-31T17:40:43Z) - Exploration Policies for On-the-Fly Controller Synthesis: A
Reinforcement Learning Approach [0.0]
We propose a new method for obtaining unboundeds based on Reinforcement Learning (RL)
Our agents learn from scratch in a highly observable partially RL task and outperform existing overall, in instances unseen during training.
arXiv Detail & Related papers (2022-10-07T20:28:25Z) - Obstacle Avoidance for Robotic Manipulator in Joint Space via Improved
Proximal Policy Optimization [6.067589886362815]
In this paper, we train a deep neural network via an improved Proximal Policy Optimization (PPO) algorithm to map from task space to joint space for a 6-DoF manipulator.
Since training such a task in real-robot is time-consuming and strenuous, we develop a simulation environment to train the model.
Experimental results showed that using our method, the robot was capable of tracking a single target or reaching multiple targets in unstructured environments.
arXiv Detail & Related papers (2022-10-03T10:21:57Z) - Inventing Relational State and Action Abstractions for Effective and
Efficient Bilevel Planning [26.715198108255162]
We develop a novel framework for learning state and action abstractions.
We learn relational, neuro-symbolic abstractions that generalize over object identities and numbers.
We show that our learned abstractions are able to quickly solve held-out tasks of longer horizons.
arXiv Detail & Related papers (2022-03-17T22:13:09Z) - Reinforcement Learning for Location-Aware Scheduling [1.0660480034605238]
We show how various aspects of the warehouse environment affect performance and execution priority.
We propose a compact representation of the state and action space for location-aware multi-agent systems.
We also show how agents trained in certain environments maintain performance in completely unseen settings.
arXiv Detail & Related papers (2022-03-07T15:51:00Z) - POMP: Pomcp-based Online Motion Planning for active visual search in
indoor environments [89.43830036483901]
We focus on the problem of learning an optimal policy for Active Visual Search (AVS) of objects in known indoor environments with an online setup.
Our POMP method uses as input the current pose of an agent and a RGB-D frame.
We validate our method on the publicly available AVD benchmark, achieving an average success rate of 0.76 with an average path length of 17.1.
arXiv Detail & Related papers (2020-09-17T08:23:50Z) - PackIt: A Virtual Environment for Geometric Planning [68.79816936618454]
PackIt is a virtual environment to evaluate and potentially learn the ability to do geometric planning.
We construct a set of challenging packing tasks using an evolutionary algorithm.
arXiv Detail & Related papers (2020-07-21T22:51:17Z) - Learning to Move with Affordance Maps [57.198806691838364]
The ability to autonomously explore and navigate a physical space is a fundamental requirement for virtually any mobile autonomous agent.
Traditional SLAM-based approaches for exploration and navigation largely focus on leveraging scene geometry.
We show that learned affordance maps can be used to augment traditional approaches for both exploration and navigation, providing significant improvements in performance.
arXiv Detail & Related papers (2020-01-08T04:05:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.