Policy-Guided Heuristic Search with Guarantees
- URL: http://arxiv.org/abs/2103.11505v1
- Date: Sun, 21 Mar 2021 22:30:57 GMT
- Title: Policy-Guided Heuristic Search with Guarantees
- Authors: Laurent Orseau, Levi H. S. Lelis
- Abstract summary: Policy-guided Heuristic Search (PHS) is a novel search algorithm that uses both a function and a policy.
PHS compares favorably with A*, Weighted A*, Greedy Best-First Search, LevinTS, and PUCT in terms of number of problems solved and search time.
- Score: 31.323430201941378
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The use of a policy and a heuristic function for guiding search can be quite
effective in adversarial problems, as demonstrated by AlphaGo and its
successors, which are based on the PUCT search algorithm. While PUCT can also
be used to solve single-agent deterministic problems, it lacks guarantees on
its search effort and it can be computationally inefficient in practice.
Combining the A* algorithm with a learned heuristic function tends to work
better in these domains, but A* and its variants do not use a policy. Moreover,
the purpose of using A* is to find solutions of minimum cost, while we seek
instead to minimize the search loss (e.g., the number of search steps). LevinTS
is guided by a policy and provides guarantees on the number of search steps
that relate to the quality of the policy, but it does not make use of a
heuristic function. In this work we introduce Policy-guided Heuristic Search
(PHS), a novel search algorithm that uses both a heuristic function and a
policy and has theoretical guarantees on the search loss that relates to both
the quality of the heuristic and of the policy. We show empirically on the
sliding-tile puzzle, Sokoban, and a puzzle from the commercial game `The
Witness' that PHS enables the rapid learning of both a policy and a heuristic
function and compares favorably with A*, Weighted A*, Greedy Best-First Search,
LevinTS, and PUCT in terms of number of problems solved and search time in all
three domains tested.
Related papers
- Augmented Bayesian Policy Search [14.292685001631945]
In practice, exploration is largely performed by deterministic policies.
First-order Bayesian Optimization (BO) methods offer a principled way of performing exploration using deterministic policies.
We introduce a novel mean function for the probabilistic model.
This results in augmenting BO methods with the action-value function.
arXiv Detail & Related papers (2024-07-05T20:56:45Z) - A Multi-Heuristic Search-based Motion Planning for Automated Parking [0.0]
In unstructured environments like parking lots or construction sites, it is challenging to achieve real-time planning.
We are adopting a Multi-Heuristic Search approach, that enables the use of multiple functions and their individual advantages.
The Multi-Heuristic A* algorithm is benchmarked against a very popular search-based algorithm, Hybrid A*.
arXiv Detail & Related papers (2023-07-15T17:33:06Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Exploiting Learned Policies in Focal Search [0.49723239539321284]
We show how to integrate policy learning into a bounded-suboptimal search algorithm.
We evaluate our focal search variants over three benchmark domains using our synthetic approach, and on the 15-puzzle using a neural network learned using 1.5 million examples.
We observe that emphDiscrepancy Focal Search, which we show expands the node which maximizes an approximation of the probability that its corresponding path is a prefix of an optimal path, obtains, in general, the best results in terms of runtime and solution quality.
arXiv Detail & Related papers (2021-04-21T13:50:40Z) - Selection-Expansion: A Unifying Framework for Motion-Planning and
Diversity Search Algorithms [69.87173070473717]
We investigate the properties of two diversity search algorithms, the Novelty Search and the Goal Exploration Process algorithms.
The relation to MP algorithms reveals that the smoothness, or lack of smoothness of the mapping between the policy parameter space and the outcome space plays a key role in the search efficiency.
arXiv Detail & Related papers (2021-04-10T13:52:27Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Loss Function Discovery for Object Detection via Convergence-Simulation
Driven Search [101.73248560009124]
We propose an effective convergence-simulation driven evolutionary search algorithm, CSE-Autoloss, for speeding up the search progress.
We conduct extensive evaluations of loss function search on popular detectors and validate the good generalization capability of searched losses.
Our experiments show that the best-discovered loss function combinations outperform default combinations by 1.1% and 0.8% in terms of mAP for two-stage and one-stage detectors.
arXiv Detail & Related papers (2021-02-09T08:34:52Z) - Average-Reward Off-Policy Policy Evaluation with Function Approximation [66.67075551933438]
We consider off-policy policy evaluation with function approximation in average-reward MDPs.
bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad.
We propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting.
arXiv Detail & Related papers (2021-01-08T00:43:04Z) - Searching for a Search Method: Benchmarking Search Algorithms for
Generating NLP Adversarial Examples [10.993342896547691]
We study the behavior of several black-box search algorithms used for generating adversarial examples for natural language processing (NLP) tasks.
We perform a fine-grained analysis of three elements relevant to search: search algorithm, search space, and search budget.
arXiv Detail & Related papers (2020-09-09T17:04:42Z) - Performance Analysis of Meta-heuristic Algorithms for a Quadratic
Assignment Problem [6.555180412600522]
A quadratic assignment problem (QAP) is an optimization problem that belongs to the class of NP-hard ones.
Heuristics and meta-heuristics algorithm are prevalent solution methods for this problem.
This paper is one of comparative studies to apply different metaheuristic algorithms for solving the QAP.
arXiv Detail & Related papers (2020-07-29T15:02:07Z) - Best-First Beam Search [78.71330480725668]
We show that the standard implementation of beam search can be made up to 10x faster in practice.
We propose a memory-reduced variant of Best-First Beam Search, which has a similar beneficial search bias in terms of downstream performance.
arXiv Detail & Related papers (2020-07-08T05:56:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.