ExPoSe: Combining State-Based Exploration with Gradient-Based Online
Search
- URL: http://arxiv.org/abs/2202.01461v2
- Date: Fri, 4 Feb 2022 06:12:22 GMT
- Title: ExPoSe: Combining State-Based Exploration with Gradient-Based Online
Search
- Authors: Dixant Mittal and Siddharth Aravindan and Wee Sun Lee
- Abstract summary: A tree-based online search algorithm iteratively simulates trajectories and updates Q-value information on a set of states represented by a tree structure.
Or, policy gradient based online search algorithms update the information obtained from simulated trajectories directly onto the parameters of the policy.
We show that it is possible to combine and leverage the strengths of these two methods for improved search performance.
- Score: 14.90561531943247
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A tree-based online search algorithm iteratively simulates trajectories and
updates Q-value information on a set of states represented by a tree structure.
Alternatively, policy gradient based online search algorithms update the
information obtained from simulated trajectories directly onto the parameters
of the policy and has been found to be effective. While tree-based methods
limit the updates from simulations to the states that exist in the tree and do
not interpolate the information to nearby states, policy gradient search
methods do not do explicit exploration. In this paper, we show that it is
possible to combine and leverage the strengths of these two methods for
improved search performance. We examine the key reasons behind the improvement
and propose a simple yet effective online search method, named Exploratory
Policy Gradient Search (ExPoSe), that updates both the parameters of the policy
as well as search information on the states in the trajectory. We conduct
experiments on complex planning problems, which include Sokoban and Hamiltonian
cycle search in sparse graphs and show that combining exploration with policy
gradient improves online search performance.
Related papers
- No learning rates needed: Introducing SALSA -- Stable Armijo Line Search Adaptation [4.45108516823267]
We identify problems of current state-of-the-art line search methods, propose enhancements, and rigorously assess their effectiveness.
We evaluate these methods on orders of magnitude and more complex data domains than previously done.
Our work is publicly available in a Python package, which provides a simple Pytorch.
arXiv Detail & Related papers (2024-07-30T08:47:02Z) - LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget.
Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z) - A Visual Active Search Framework for Geospatial Exploration [36.31732056074638]
Many problems can be viewed as forms of geospatial search aided by aerial imagery.
We model this class of problems in a visual active search (VAS) framework, which has three key inputs.
We propose a reinforcement learning approach for VAS that learns a meta-search policy from a collection of fully annotated search tasks.
arXiv Detail & Related papers (2022-11-28T21:53:05Z) - SoftTreeMax: Policy Gradient with Tree Search [72.9513807133171]
We introduce SoftTreeMax, the first approach that integrates tree-search into policy gradient.
On Atari, SoftTreeMax demonstrates up to 5x better performance in faster run-time compared with distributed PPO.
arXiv Detail & Related papers (2022-09-28T09:55:47Z) - RetroGraph: Retrosynthetic Planning with Graph Search [101.92603715499112]
Retrosynthetic planning aims to find a reaction pathway to synthesize a target molecule.
We propose a graph-based search policy that eliminates the redundant explorations of any intermediate molecules.
Our method can search a batch of targets together in the graph and remove the inter-target duplication in the tree-based search methods.
arXiv Detail & Related papers (2022-06-23T05:01:29Z) - Dimensionality Reduction and Prioritized Exploration for Policy Search [29.310742141970394]
Black-box policy optimization is a class of reinforcement learning algorithms that explores and updates the policies at the parameter level.
We present a novel method to prioritize the exploration of effective parameters and cope with full covariance matrix updates.
Our algorithm learns faster than recent approaches and requires fewer samples to achieve state-of-the-art results.
arXiv Detail & Related papers (2022-03-09T15:17:09Z) - DAAS: Differentiable Architecture and Augmentation Policy Search [107.53318939844422]
This work considers the possible coupling between neural architectures and data augmentation and proposes an effective algorithm jointly searching for them.
Our approach achieves 97.91% accuracy on CIFAR-10 and 76.6% Top-1 accuracy on ImageNet dataset, showing the outstanding performance of our search algorithm.
arXiv Detail & Related papers (2021-09-30T17:15:17Z) - Selection-Expansion: A Unifying Framework for Motion-Planning and
Diversity Search Algorithms [69.87173070473717]
We investigate the properties of two diversity search algorithms, the Novelty Search and the Goal Exploration Process algorithms.
The relation to MP algorithms reveals that the smoothness, or lack of smoothness of the mapping between the policy parameter space and the outcome space plays a key role in the search efficiency.
arXiv Detail & Related papers (2021-04-10T13:52:27Z) - Parameterizing Branch-and-Bound Search Trees to Learn Branching Policies [76.83991682238666]
Branch and Bound (B&B) is the exact tree search method typically used to solve Mixed-Integer Linear Programming problems (MILPs)
We propose a novel imitation learning framework, and introduce new input features and architectures to represent branching.
arXiv Detail & Related papers (2020-02-12T17:43:23Z) - Population-Guided Parallel Policy Search for Reinforcement Learning [17.360163137926]
A new population-guided parallel learning scheme is proposed to enhance the performance of off-policy reinforcement learning (RL)
In the proposed scheme, multiple identical learners with their own value-functions and policies share a common experience replay buffer, and search a good policy in collaboration with the guidance of the best policy information.
arXiv Detail & Related papers (2020-01-09T10:13:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.