Related papers: TransPath: Learning Heuristics For Grid-Based Pathfinding via Transformers

TransPath: Learning Heuristics For Grid-Based Pathfinding via Transformers

URL: http://arxiv.org/abs/2212.11730v1
Date: Thu, 22 Dec 2022 14:26:11 GMT
Title: TransPath: Learning Heuristics For Grid-Based Pathfinding via Transformers
Authors: Daniil Kirilenko, Anton Andreychuk, Aleksandr Panov, Konstantin Yakovlev
Abstract summary: We suggest learning the instance-dependent proxies that are supposed to notably increase the efficiency of the search. The first proxy we suggest to learn is the correction factor, i.e. the ratio between the instance independent cost-to-go estimate and the perfect one. The second proxy is the path probability, which indicates how likely the grid cell is lying on the shortest path.
Score: 64.88759709443819
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Heuristic search algorithms, e.g. A*, are the commonly used tools for pathfinding on grids, i.e. graphs of regular structure that are widely employed to represent environments in robotics, video games etc. Instance-independent heuristics for grid graphs, e.g. Manhattan distance, do not take the obstacles into account and, thus, the search led by such heuristics performs poorly in the obstacle-rich environments. To this end, we suggest learning the instance-dependent heuristic proxies that are supposed to notably increase the efficiency of the search. The first heuristic proxy we suggest to learn is the correction factor, i.e. the ratio between the instance independent cost-to-go estimate and the perfect one (computed offline at the training phase). Unlike learning the absolute values of the cost-to-go heuristic function, which was known before, when learning the correction factor the knowledge of the instance-independent heuristic is utilized. The second heuristic proxy is the path probability, which indicates how likely the grid cell is lying on the shortest path. This heuristic can be utilized in the Focal Search framework as the secondary heuristic, allowing us to preserve the guarantees on the bounded sub-optimality of the solution. We learn both suggested heuristics in a supervised fashion with the state-of-the-art neural networks containing attention blocks (transformers). We conduct a thorough empirical evaluation on a comprehensive dataset of planning tasks, showing that the suggested techniques i) reduce the computational effort of the A* up to a factor of $4$x while producing the solutions, which costs exceed the costs of the optimal solutions by less than $0.3$% on average; ii) outperform the competitors, which include the conventional techniques from the heuristic search, i.e. weighted A*, as well as the state-of-the-art learnable planners.

Related papers

Finding Transformer Circuits with Edge Pruning [71.12127707678961]
We propose Edge Pruning as an effective and scalable solution to automated circuit discovery. Our method finds circuits in GPT-2 that use less than half the number of edges compared to circuits found by previous methods. Thanks to its efficiency, we scale Edge Pruning to CodeLlama-13B, a model over 100x the scale that prior methods operate on.
arXiv Detail & Related papers (2024-06-24T16:40:54Z)
REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world. Recent methods aim to mitigate misalignment by learning reward functions from human preferences. We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z)
Efficient Methods for Non-stationary Online Learning [67.3300478545554]
We present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $mathcalO(log T)$ to $1$. Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods.
arXiv Detail & Related papers (2023-09-16T07:30:12Z)
On Using Admissible Bounds for Learning Forward Search Heuristics [9.749638953163391]
We focus on how to effectively utilize the information provided by admissibles in learning. We model the learned as a truncated ssian, where admissibles are used not as training targets but as lower bounds of this distribution. Results show that our proposed method converges faster during training and yields bettergauss.
arXiv Detail & Related papers (2023-08-23T04:14:45Z)
Unsupervised Learning for Solving the Travelling Salesman Problem [28.62497359169851]
We propose UTSP, an unsupervised learning framework for solving the Travelling Salesman Problem (TSP) We train a Graph Neural Network (GNN) using a surrogate loss. The GNN outputs a heat map representing the probability for each edge to be part of the optimal path. We then apply local search to generate our final prediction based on the heat map.
arXiv Detail & Related papers (2023-03-19T02:30:27Z)
Learning Graph Search Heuristics [48.83557172525969]
We present PHIL (Path Heuristic with Imitation Learning), a novel neural architecture and a training algorithm for discovering graph search and navigations from data. Our function learns graph embeddings useful for inferring node distances, runs in constant time independent of graph sizes, and can be easily incorporated in an algorithm such as A* at test time. Experiments show that PHIL reduces the number of explored nodes compared to state-of-the-art methods on benchmark datasets by 58.5% on average.
arXiv Detail & Related papers (2022-12-07T22:28:00Z)
Learning to Optimize Permutation Flow Shop Scheduling via Graph-based Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems. We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately. Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z)
MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems. We propose a novel method for computing the normalized maximum likelihood (NML) distribution. We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z)
Neural Weighted A*: Learning Graph Costs and Heuristics with Differentiable Anytime A* [12.117737635879037]
Recent works related to data-driven planning aim at learning either cost functions or planner functions, but not both. We propose Neural Weighted A*, a differentiable anytime planner able to produce improved representations of planar maps as graph costs and planners. We experimentally show the validity of our claims by testing Neural Weighted A* against several baselines, introducing a novel, tile-based navigation dataset.
arXiv Detail & Related papers (2021-05-04T13:17:30Z)
Robot Navigation in a Crowd by Integrating Deep Reinforcement Learning and Online Planning [8.211771115758381]
It is still an open and challenging problem for mobile robots navigating along time-efficient and collision-free paths in a crowd. Deep reinforcement learning is a promising solution to this problem. We propose a graph-based deep reinforcement learning method, SG-DQN. Our model can help the robot better understand the crowd and achieve a high success rate of more than 0.99 in the crowd navigation task.
arXiv Detail & Related papers (2021-02-26T02:17:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.