Subgoal-Guided Policy Heuristic Search with Learned Subgoals
- URL: http://arxiv.org/abs/2506.07255v2
- Date: Tue, 29 Jul 2025 23:21:56 GMT
- Title: Subgoal-Guided Policy Heuristic Search with Learned Subgoals
- Authors: Jake Tuero, Michael Buro, Levi H. S. Lelis,
- Abstract summary: Policy tree search is a family of tree search algorithms that use a policy to guide the search.<n>This paper introduces a novel method for learning subgoal-based policies for policy tree search algorithms.
- Score: 15.570621284198015
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Policy tree search is a family of tree search algorithms that use a policy to guide the search. These algorithms provide guarantees on the number of expansions required to solve a given problem that are based on the quality of the policy. While these algorithms have shown promising results, the process in which they are trained requires complete solution trajectories to train the policy. Search trajectories are obtained during a trial-and-error search process. When the training problem instances are hard, learning can be prohibitively costly, especially when starting from a randomly initialized policy. As a result, search samples are wasted in failed attempts to solve these hard instances. This paper introduces a novel method for learning subgoal-based policies for policy tree search algorithms. The subgoals and policies conditioned on subgoals are learned from the trees that the search expands while attempting to solve problems, including the search trees of failed attempts. We empirically show that our policy formulation and training method improve the sample efficiency of learning a policy and heuristic function in this online setting.
Related papers
- LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget.
Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z) - Hybrid Search for Efficient Planning with Completeness Guarantees [63.02803974708516]
We propose an efficient approach to augment a subgoal search method to achieve completeness in discrete action spaces.
This solution achieves the best of both worlds: the practical efficiency of high-level search and the completeness of low-level search.
arXiv Detail & Related papers (2023-10-19T15:16:43Z) - Learning Open Domain Multi-hop Search Using Reinforcement Learning [20.078330789576256]
We teach an automated agent to learn how to search for multi-hop paths of relations between entities in an open domain.
We implement the method in an actor-critic reinforcement learning algorithm and evaluate it on a dataset of search problems derived from a subset of English Wikipedia.
arXiv Detail & Related papers (2022-05-30T17:44:19Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - ExPoSe: Combining State-Based Exploration with Gradient-Based Online
Search [14.90561531943247]
A tree-based online search algorithm iteratively simulates trajectories and updates Q-value information on a set of states represented by a tree structure.
Or, policy gradient based online search algorithms update the information obtained from simulated trajectories directly onto the parameters of the policy.
We show that it is possible to combine and leverage the strengths of these two methods for improved search performance.
arXiv Detail & Related papers (2022-02-03T08:39:25Z) - C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation.
We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states.
E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z) - Exploiting Learned Policies in Focal Search [0.49723239539321284]
We show how to integrate policy learning into a bounded-suboptimal search algorithm.
We evaluate our focal search variants over three benchmark domains using our synthetic approach, and on the 15-puzzle using a neural network learned using 1.5 million examples.
We observe that emphDiscrepancy Focal Search, which we show expands the node which maximizes an approximation of the probability that its corresponding path is a prefix of an optimal path, obtains, in general, the best results in terms of runtime and solution quality.
arXiv Detail & Related papers (2021-04-21T13:50:40Z) - Selection-Expansion: A Unifying Framework for Motion-Planning and
Diversity Search Algorithms [69.87173070473717]
We investigate the properties of two diversity search algorithms, the Novelty Search and the Goal Exploration Process algorithms.
The relation to MP algorithms reveals that the smoothness, or lack of smoothness of the mapping between the policy parameter space and the outcome space plays a key role in the search efficiency.
arXiv Detail & Related papers (2021-04-10T13:52:27Z) - Policy-Guided Heuristic Search with Guarantees [31.323430201941378]
Policy-guided Heuristic Search (PHS) is a novel search algorithm that uses both a function and a policy.
PHS compares favorably with A*, Weighted A*, Greedy Best-First Search, LevinTS, and PUCT in terms of number of problems solved and search time.
arXiv Detail & Related papers (2021-03-21T22:30:57Z) - Policy Gradient for Continuing Tasks in Non-stationary Markov Decision
Processes [112.38662246621969]
Reinforcement learning considers the problem of finding policies that maximize an expected cumulative reward in a Markov decision process with unknown transition probabilities.
We compute unbiased navigation gradients of the value function which we use as ascent directions to update the policy.
A major drawback of policy gradient-type algorithms is that they are limited to episodic tasks unless stationarity assumptions are imposed.
arXiv Detail & Related papers (2020-10-16T15:15:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.