TransPath: Learning Heuristics For Grid-Based Pathfinding via
Transformers
- URL: http://arxiv.org/abs/2212.11730v1
- Date: Thu, 22 Dec 2022 14:26:11 GMT
- Title: TransPath: Learning Heuristics For Grid-Based Pathfinding via
Transformers
- Authors: Daniil Kirilenko, Anton Andreychuk, Aleksandr Panov, Konstantin
Yakovlev
- Abstract summary: We suggest learning the instance-dependent proxies that are supposed to notably increase the efficiency of the search.
The first proxy we suggest to learn is the correction factor, i.e. the ratio between the instance independent cost-to-go estimate and the perfect one.
The second proxy is the path probability, which indicates how likely the grid cell is lying on the shortest path.
- Score: 64.88759709443819
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Heuristic search algorithms, e.g. A*, are the commonly used tools for
pathfinding on grids, i.e. graphs of regular structure that are widely employed
to represent environments in robotics, video games etc. Instance-independent
heuristics for grid graphs, e.g. Manhattan distance, do not take the obstacles
into account and, thus, the search led by such heuristics performs poorly in
the obstacle-rich environments. To this end, we suggest learning the
instance-dependent heuristic proxies that are supposed to notably increase the
efficiency of the search. The first heuristic proxy we suggest to learn is the
correction factor, i.e. the ratio between the instance independent cost-to-go
estimate and the perfect one (computed offline at the training phase). Unlike
learning the absolute values of the cost-to-go heuristic function, which was
known before, when learning the correction factor the knowledge of the
instance-independent heuristic is utilized. The second heuristic proxy is the
path probability, which indicates how likely the grid cell is lying on the
shortest path. This heuristic can be utilized in the Focal Search framework as
the secondary heuristic, allowing us to preserve the guarantees on the bounded
sub-optimality of the solution. We learn both suggested heuristics in a
supervised fashion with the state-of-the-art neural networks containing
attention blocks (transformers). We conduct a thorough empirical evaluation on
a comprehensive dataset of planning tasks, showing that the suggested
techniques i) reduce the computational effort of the A* up to a factor of $4$x
while producing the solutions, which costs exceed the costs of the optimal
solutions by less than $0.3$% on average; ii) outperform the competitors, which
include the conventional techniques from the heuristic search, i.e. weighted
A*, as well as the state-of-the-art learnable planners.
Related papers
- Finding Transformer Circuits with Edge Pruning [71.12127707678961]
We propose Edge Pruning as an effective and scalable solution to automated circuit discovery.
Our method finds circuits in GPT-2 that use less than half the number of edges compared to circuits found by previous methods.
Thanks to its efficiency, we scale Edge Pruning to CodeLlama-13B, a model over 100x the scale that prior methods operate on.
arXiv Detail & Related papers (2024-06-24T16:40:54Z) - Efficient Methods for Non-stationary Online Learning [67.3300478545554]
We present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $mathcalO(log T)$ to $1$.
Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods.
arXiv Detail & Related papers (2023-09-16T07:30:12Z) - On Using Admissible Bounds for Learning Forward Search Heuristics [9.749638953163391]
We focus on how to effectively utilize the information provided by admissibles in learning.
We model the learned as a truncated ssian, where admissibles are used not as training targets but as lower bounds of this distribution.
Results show that our proposed method converges faster during training and yields bettergauss.
arXiv Detail & Related papers (2023-08-23T04:14:45Z) - Unsupervised Learning for Solving the Travelling Salesman Problem [28.62497359169851]
We propose UTSP, an unsupervised learning framework for solving the Travelling Salesman Problem (TSP)
We train a Graph Neural Network (GNN) using a surrogate loss. The GNN outputs a heat map representing the probability for each edge to be part of the optimal path.
We then apply local search to generate our final prediction based on the heat map.
arXiv Detail & Related papers (2023-03-19T02:30:27Z) - Learning Graph Search Heuristics [48.83557172525969]
We present PHIL (Path Heuristic with Imitation Learning), a novel neural architecture and a training algorithm for discovering graph search and navigations from data.
Our function learns graph embeddings useful for inferring node distances, runs in constant time independent of graph sizes, and can be easily incorporated in an algorithm such as A* at test time.
Experiments show that PHIL reduces the number of explored nodes compared to state-of-the-art methods on benchmark datasets by 58.5% on average.
arXiv Detail & Related papers (2022-12-07T22:28:00Z) - Learning to Optimize Permutation Flow Shop Scheduling via Graph-based
Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems.
We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately.
Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Neural Weighted A*: Learning Graph Costs and Heuristics with
Differentiable Anytime A* [12.117737635879037]
Recent works related to data-driven planning aim at learning either cost functions or planner functions, but not both.
We propose Neural Weighted A*, a differentiable anytime planner able to produce improved representations of planar maps as graph costs and planners.
We experimentally show the validity of our claims by testing Neural Weighted A* against several baselines, introducing a novel, tile-based navigation dataset.
arXiv Detail & Related papers (2021-05-04T13:17:30Z) - Robot Navigation in a Crowd by Integrating Deep Reinforcement Learning
and Online Planning [8.211771115758381]
It is still an open and challenging problem for mobile robots navigating along time-efficient and collision-free paths in a crowd.
Deep reinforcement learning is a promising solution to this problem.
We propose a graph-based deep reinforcement learning method, SG-DQN.
Our model can help the robot better understand the crowd and achieve a high success rate of more than 0.99 in the crowd navigation task.
arXiv Detail & Related papers (2021-02-26T02:17:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.