Exploiting Learned Policies in Focal Search
- URL: http://arxiv.org/abs/2104.10535v1
- Date: Wed, 21 Apr 2021 13:50:40 GMT
- Title: Exploiting Learned Policies in Focal Search
- Authors: Pablo Araneda, Matias Greco, Jorge Baier
- Abstract summary: We show how to integrate policy learning into a bounded-suboptimal search algorithm.
We evaluate our focal search variants over three benchmark domains using our synthetic approach, and on the 15-puzzle using a neural network learned using 1.5 million examples.
We observe that emphDiscrepancy Focal Search, which we show expands the node which maximizes an approximation of the probability that its corresponding path is a prefix of an optimal path, obtains, in general, the best results in terms of runtime and solution quality.
- Score: 0.49723239539321284
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent machine-learning approaches to deterministic search and
domain-independent planning employ policy learning to speed up search.
Unfortunately, when attempting to solve a search problem by successively
applying a policy, no guarantees can be given on solution quality. The problem
of how to effectively use a learned policy within a bounded-suboptimal search
algorithm remains largely as an open question. In this paper, we propose
various ways in which such policies can be integrated into Focal Search,
assuming that the policy is a neural network classifier. Furthermore, we
provide mathematical foundations for some of the resulting algorithms. To
evaluate the resulting algorithms over a number of policies with varying
accuracy, we use synthetic policies which can be generated for a target
accuracy for problems where the search space can be held in memory. We evaluate
our focal search variants over three benchmark domains using our synthetic
approach, and on the 15-puzzle using a neural network learned using 1.5 million
examples. We observe that \emph{Discrepancy Focal Search}, which we show
expands the node which maximizes an approximation of the probability that its
corresponding path is a prefix of an optimal path, obtains, in general, the
best results in terms of runtime and solution quality.
Related papers
- Sample-Efficient Multi-Objective Learning via Generalized Policy
Improvement Prioritization [8.836422771217084]
Multi-objective reinforcement learning (MORL) algorithms tackle sequential decision problems where agents may have different preferences.
We introduce a novel algorithm that uses Generalized Policy Improvement (GPI) to define principled, formally-derived prioritization schemes.
We empirically show that our method outperforms state-of-the-art MORL algorithms in challenging multi-objective tasks.
arXiv Detail & Related papers (2023-01-18T20:54:40Z) - Learning Proximal Operators to Discover Multiple Optima [66.98045013486794]
We present an end-to-end method to learn the proximal operator across non-family problems.
We show that for weakly-ized objectives and under mild conditions, the method converges globally.
arXiv Detail & Related papers (2022-01-28T05:53:28Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Policy-Guided Heuristic Search with Guarantees [31.323430201941378]
Policy-guided Heuristic Search (PHS) is a novel search algorithm that uses both a function and a policy.
PHS compares favorably with A*, Weighted A*, Greedy Best-First Search, LevinTS, and PUCT in terms of number of problems solved and search time.
arXiv Detail & Related papers (2021-03-21T22:30:57Z) - Average-Reward Off-Policy Policy Evaluation with Function Approximation [66.67075551933438]
We consider off-policy policy evaluation with function approximation in average-reward MDPs.
bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad.
We propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting.
arXiv Detail & Related papers (2021-01-08T00:43:04Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - Robust Reinforcement Learning using Least Squares Policy Iteration with
Provable Performance Guarantees [3.8073142980733]
This paper addresses the problem of model-free reinforcement learning for Robust Markov Decision Process (RMDP) with large state spaces.
We first propose the Robust Least Squares Policy Evaluation algorithm, which is a multi-step online model-free learning algorithm for policy evaluation.
We then propose Robust Least Squares Policy Iteration (RLSPI) algorithm for learning the optimal robust policy.
arXiv Detail & Related papers (2020-06-20T16:26:50Z) - Learning 2-opt Heuristics for the Traveling Salesman Problem via Deep
Reinforcement Learning [2.4565068569913384]
We propose to learn a local search gradient based on 2-opt operators via deep reinforcement learning.
We show that the learned policies can improve even over random initial solutions and approach near-optimal solutions at a faster rate than previous state-of-the-art deep learning methods.
arXiv Detail & Related papers (2020-04-03T14:51:54Z) - Robust Policy Search for Robot Navigation with Stochastic Meta-Policies [5.7871177330714145]
In this work, we exploit the main ingredients of Bayesian optimization to provide robustness to different issues for policy search algorithms.
We combine several methods and show how their interaction works better than the sum of the parts.
We compare the proposed algorithm with previous results in several optimization benchmarks and robot tasks, such as pushing objects with a robot arm, or path finding with a rover.
arXiv Detail & Related papers (2020-03-02T16:30:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.