Scalable Online Planning via Reinforcement Learning Fine-Tuning
- URL: http://arxiv.org/abs/2109.15316v1
- Date: Thu, 30 Sep 2021 17:59:11 GMT
- Title: Scalable Online Planning via Reinforcement Learning Fine-Tuning
- Authors: Arnaud Fickinger, Hengyuan Hu, Brandon Amos, Stuart Russell, Noam
Brown
- Abstract summary: Tabular search methods do not scale well with the size of the search space.
We replace this with online model-based fine-tuning of a policy neural network via reinforcement learning.
In particular, we use our search algorithm to achieve a new state-of-the-art result in self-play Hanabi.
- Score: 25.27878823988181
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Lookahead search has been a critical component of recent AI successes, such
as in the games of chess, go, and poker. However, the search methods used in
these games, and in many other settings, are tabular. Tabular search methods do
not scale well with the size of the search space, and this problem is
exacerbated by stochasticity and partial observability. In this work we replace
tabular search with online model-based fine-tuning of a policy neural network
via reinforcement learning, and show that this approach outperforms
state-of-the-art search algorithms in benchmark settings. In particular, we use
our search algorithm to achieve a new state-of-the-art result in self-play
Hanabi, and show the generality of our algorithm by also showing that it
outperforms tabular search in the Atari game Ms. Pacman.
Related papers
- Playing Board Games with the Predict Results of Beam Search Algorithm [0.0]
This paper introduces a novel algorithm for two-player deterministic games with perfect information, which we call PROBS (Predict Results of Beam Search)
We evaluate the performance of our algorithm across a selection of board games, where it consistently demonstrates an increased winning ratio against baseline opponents.
A key result of this study is that the PROBS algorithm operates effectively, even when the beam search size is considerably smaller than the average number of turns in the game.
arXiv Detail & Related papers (2024-04-23T20:10:27Z) - Relation-aware Ensemble Learning for Knowledge Graph Embedding [68.94900786314666]
We propose to learn an ensemble by leveraging existing methods in a relation-aware manner.
exploring these semantics using relation-aware ensemble leads to a much larger search space than general ensemble methods.
We propose a divide-search-combine algorithm RelEns-DSC that searches the relation-wise ensemble weights independently.
arXiv Detail & Related papers (2023-10-13T07:40:12Z) - The Update-Equivalence Framework for Decision-Time Planning [78.44953498421854]
We introduce an alternative framework for decision-time planning that is not based on solving subgames, but rather on update equivalence.
We derive a provably sound search algorithm for fully cooperative games based on mirror descent and a search algorithm for adversarial games based on magnetic mirror descent.
arXiv Detail & Related papers (2023-04-25T20:28:55Z) - CrossBeam: Learning to Search in Bottom-Up Program Synthesis [51.37514793318815]
We propose training a neural model to learn a hands-on search policy for bottom-up synthesis.
Our approach, called CrossBeam, uses the neural model to choose how to combine previously-explored programs into new programs.
We observe that CrossBeam learns to search efficiently, exploring much smaller portions of the program space compared to the state-of-the-art.
arXiv Detail & Related papers (2022-03-20T04:41:05Z) - Revisiting Game Representations: The Hidden Costs of Efficiency in
Sequential Decision-making Algorithms [0.6749750044497732]
Recent advancements in algorithms for sequential decision-making under imperfect information have shown remarkable success in large games.
These algorithms traditionally formalize the games using the extensive-form game formalism.
We show that a popular workaround involves using a specialized representation based on player specific information-state trees.
arXiv Detail & Related papers (2021-12-20T22:34:19Z) - Exploring Complicated Search Spaces with Interleaving-Free Sampling [127.07551427957362]
In this paper, we build the search algorithm upon a complicated search space with long-distance connections.
We present a simple yet effective algorithm named textbfIF-NAS, where we perform a periodic sampling strategy to construct different sub-networks.
In the proposed search space, IF-NAS outperform both random sampling and previous weight-sharing search algorithms by a significant margin.
arXiv Detail & Related papers (2021-12-05T06:42:48Z) - Temporal Induced Self-Play for Stochastic Bayesian Games [32.88124137877018]
We propose Temporal-Induced Self-Play (TISP) to find strategies with decent performances from any decision point onward.
TISP uses belief-space representation, backward induction, policy learning, and non-parametric approximation.
We prove that TISP-based algorithms can find approximate Perfect Bayesian Equilibrium in zero-sum one-sided games with finite horizon.
arXiv Detail & Related papers (2021-08-21T05:36:42Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z) - Using Tabu Search Algorithm for Map Generation in the Terra Mystica
Tabletop Game [60.71662712899962]
Tabu Search (TS) metaheuristic improves simple local search algorithms by enabling the algorithm to escape local optima points.
This paper investigates the performance of TS and considers the effects of the size of the Tabu list and the size of the neighbourhood for a procedural content generation.
arXiv Detail & Related papers (2020-06-04T09:15:46Z) - Single-Agent Optimization Through Policy Iteration Using Monte-Carlo
Tree Search [8.22379888383833]
Combination of Monte-Carlo Tree Search (MCTS) and deep reinforcement learning is state-of-the-art in two-player perfect-information games.
We describe a search algorithm that uses a variant of MCTS which we enhanced by 1) a novel action value normalization mechanism for games with potentially unbounded rewards, 2) defining a virtual loss function that enables effective search parallelization, and 3) a policy network, trained by generations of self-play, to guide the search.
arXiv Detail & Related papers (2020-05-22T18:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.