Scale-Adaptive Balancing of Exploration and Exploitation in Classical Planning
- URL: http://arxiv.org/abs/2305.09840v3
- Date: Fri, 30 Aug 2024 15:57:01 GMT
- Title: Scale-Adaptive Balancing of Exploration and Exploitation in Classical Planning
- Authors: Stephen Wissow, Masataro Asai,
- Abstract summary: We show that a more detailed theoretical understanding of MAB literature helps improve existing planning algorithms.
We propose GreedyUCT-Normal, a MCTS/THTS algorithm with UCB1-Normal bandit for agile classical planning.
- Score: 1.6574413179773757
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Balancing exploration and exploitation has been an important problem in both game tree search and automated planning. However, while the problem has been extensively analyzed within the Multi-Armed Bandit (MAB) literature, the planning community has had limited success when attempting to apply those results. We show that a more detailed theoretical understanding of MAB literature helps improve existing planning algorithms that are based on Monte Carlo Tree Search (MCTS) / Trial Based Heuristic Tree Search (THTS). In particular, THTS uses UCB1 MAB algorithms in an ad hoc manner, as UCB1's theoretical requirement of fixed bounded support reward distributions is not satisfied within heuristic search for classical planning. The core issue lies in UCB1's lack of adaptations to the different scales of the rewards. We propose GreedyUCT-Normal, a MCTS/THTS algorithm with UCB1-Normal bandit for agile classical planning, which handles distributions with different scales by taking the reward variance into consideration, and resulted in an improved algorithmic performance (more plans found with less node expansions) that outperforms Greedy Best First Search and existing MCTS/THTS-based algorithms (GreedyUCT,GreedyUCT*).
Related papers
- LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget.
Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z) - AlphaZeroES: Direct score maximization outperforms planning loss minimization [61.17702187957206]
Planning at execution time has been shown to dramatically improve performance for agents in both single-agent and multi-agent settings.
A family of approaches to planning at execution time are AlphaZero and its variants, which use Monte Carlo Tree Search together with a neural network that guides the search by predicting state values and action probabilities.
We show that, across multiple environments, directly maximizing the episode score outperforms minimizing the planning loss.
arXiv Detail & Related papers (2024-06-12T23:00:59Z) - Diverse, Top-k, and Top-Quality Planning Over Simulators [9.924007495979582]
This paper proposes a novel alternative approach that uses Monte Carlo Tree Search (MCTS)
We present a procedure for extracting bounded sets of plans from pre-generated search trees in best-first order, and a metric for evaluating the relative quality of paths through a search tree.
Our results show that our method can generate diverse and high-quality plan sets in domains where classical planners are not applicable.
arXiv Detail & Related papers (2023-08-25T02:55:19Z) - DeciLS-PBO: an Effective Local Search Method for Pseudo-Boolean
Optimization [10.513103815142731]
We find two ways to improve the local search algorithms in solving Pseudo-Boolean Optimization (PBO)
Our algorithm DeciLS-PBO has a promising performance compared to the state-of-the-art algorithms.
arXiv Detail & Related papers (2023-01-28T17:03:56Z) - Non-Clairvoyant Scheduling with Predictions Revisited [77.86290991564829]
In non-clairvoyant scheduling, the task is to find an online strategy for scheduling jobs with a priori unknown processing requirements.
We revisit this well-studied problem in a recently popular learning-augmented setting that integrates (untrusted) predictions in algorithm design.
We show that these predictions have desired properties, admit a natural error measure as well as algorithms with strong performance guarantees.
arXiv Detail & Related papers (2022-02-21T13:18:11Z) - Bayesian decision-making under misspecified priors with applications to
meta-learning [64.38020203019013]
Thompson sampling and other sequential decision-making algorithms are popular approaches to tackle explore/exploit trade-offs in contextual bandits.
We show that performance degrades gracefully with misspecified priors.
arXiv Detail & Related papers (2021-07-03T23:17:26Z) - Limited depth bandit-based strategy for Monte Carlo planning in
continuous action spaces [4.1208902102156015]
We propose LD-HOO, a limited depth variant of the hierarchical optimization (HOO) algorithm.
We show that our algorithm exhibits the same cumulative regret as the original HOO while being faster and more memory efficient.
We then propose a Monte Carlo tree search algorithm based on LD-HOO for optimal control problems and illustrate the resulting approach's application in several optimal control problems.
arXiv Detail & Related papers (2021-06-29T17:30:01Z) - Efficient Computation of Expectations under Spanning Tree Distributions [67.71280539312536]
We propose unified algorithms for the important cases of first-order expectations and second-order expectations in edge-factored, non-projective spanning-tree models.
Our algorithms exploit a fundamental connection between gradients and expectations, which allows us to derive efficient algorithms.
arXiv Detail & Related papers (2020-08-29T14:58:26Z) - On Effective Parallelization of Monte Carlo Tree Search [51.15940034629022]
Monte Carlo Tree Search (MCTS) is computationally expensive as it requires a substantial number of rollouts to construct the search tree.
How to design effective parallel MCTS algorithms has not been systematically studied and remains poorly understood.
We demonstrate how proposed necessary conditions can be adopted to design more effective parallel MCTS algorithms.
arXiv Detail & Related papers (2020-06-15T21:36:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.