Related papers: Scalable Online Planning via Reinforcement Learning Fine-Tuning

Scalable Online Planning via Reinforcement Learning Fine-Tuning

URL: http://arxiv.org/abs/2109.15316v1
Date: Thu, 30 Sep 2021 17:59:11 GMT
Title: Scalable Online Planning via Reinforcement Learning Fine-Tuning
Authors: Arnaud Fickinger, Hengyuan Hu, Brandon Amos, Stuart Russell, Noam Brown
Abstract summary: Tabular search methods do not scale well with the size of the search space. We replace this with online model-based fine-tuning of a policy neural network via reinforcement learning. In particular, we use our search algorithm to achieve a new state-of-the-art result in self-play Hanabi.
Score: 25.27878823988181
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Lookahead search has been a critical component of recent AI successes, such as in the games of chess, go, and poker. However, the search methods used in these games, and in many other settings, are tabular. Tabular search methods do not scale well with the size of the search space, and this problem is exacerbated by stochasticity and partial observability. In this work we replace tabular search with online model-based fine-tuning of a policy neural network via reinforcement learning, and show that this approach outperforms state-of-the-art search algorithms in benchmark settings. In particular, we use our search algorithm to achieve a new state-of-the-art result in self-play Hanabi, and show the generality of our algorithm by also showing that it outperforms tabular search in the Atari game Ms. Pacman.

Related papers

Implicit Search via Discrete Diffusion: A Study on Chess [104.74301574891359]
We propose DiffuSearch, a model that does textitimplicit search by looking into the future world via discrete diffusion modeling. We instantiate DiffuSearch on a classical board game, Chess, where explicit search is known to be essential. We show DiffuSearch outperforms both the searchless and explicit search-enhanced policies.
arXiv Detail & Related papers (2025-02-27T06:25:15Z)
Playing Board Games with the Predict Results of Beam Search Algorithm [0.0]
This paper introduces a novel algorithm for two-player deterministic games with perfect information, which we call PROBS (Predict Results of Beam Search) We evaluate the performance of our algorithm across a selection of board games, where it consistently demonstrates an increased winning ratio against baseline opponents. A key result of this study is that the PROBS algorithm operates effectively, even when the beam search size is considerably smaller than the average number of turns in the game.
arXiv Detail & Related papers (2024-04-23T20:10:27Z)
Relation-aware Ensemble Learning for Knowledge Graph Embedding [68.94900786314666]
We propose to learn an ensemble by leveraging existing methods in a relation-aware manner. exploring these semantics using relation-aware ensemble leads to a much larger search space than general ensemble methods. We propose a divide-search-combine algorithm RelEns-DSC that searches the relation-wise ensemble weights independently.
arXiv Detail & Related papers (2023-10-13T07:40:12Z)
The Update-Equivalence Framework for Decision-Time Planning [78.44953498421854]
We introduce an alternative framework for decision-time planning that is not based on solving subgames, but rather on update equivalence. We derive a provably sound search algorithm for fully cooperative games based on mirror descent and a search algorithm for adversarial games based on magnetic mirror descent.
arXiv Detail & Related papers (2023-04-25T20:28:55Z)
CrossBeam: Learning to Search in Bottom-Up Program Synthesis [51.37514793318815]
We propose training a neural model to learn a hands-on search policy for bottom-up synthesis. Our approach, called CrossBeam, uses the neural model to choose how to combine previously-explored programs into new programs. We observe that CrossBeam learns to search efficiently, exploring much smaller portions of the program space compared to the state-of-the-art.
arXiv Detail & Related papers (2022-03-20T04:41:05Z)
Revisiting Game Representations: The Hidden Costs of Efficiency in Sequential Decision-making Algorithms [0.6749750044497732]
Recent advancements in algorithms for sequential decision-making under imperfect information have shown remarkable success in large games. These algorithms traditionally formalize the games using the extensive-form game formalism. We show that a popular workaround involves using a specialized representation based on player specific information-state trees.
arXiv Detail & Related papers (2021-12-20T22:34:19Z)
Exploring Complicated Search Spaces with Interleaving-Free Sampling [127.07551427957362]
In this paper, we build the search algorithm upon a complicated search space with long-distance connections. We present a simple yet effective algorithm named textbfIF-NAS, where we perform a periodic sampling strategy to construct different sub-networks. In the proposed search space, IF-NAS outperform both random sampling and previous weight-sharing search algorithms by a significant margin.
arXiv Detail & Related papers (2021-12-05T06:42:48Z)
Temporal Induced Self-Play for Stochastic Bayesian Games [32.88124137877018]
We propose Temporal-Induced Self-Play (TISP) to find strategies with decent performances from any decision point onward. TISP uses belief-space representation, backward induction, policy learning, and non-parametric approximation. We prove that TISP-based algorithms can find approximate Perfect Bayesian Equilibrium in zero-sum one-sided games with finite horizon.
arXiv Detail & Related papers (2021-08-21T05:36:42Z)
Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z)
Using Tabu Search Algorithm for Map Generation in the Terra Mystica Tabletop Game [60.71662712899962]
Tabu Search (TS) metaheuristic improves simple local search algorithms by enabling the algorithm to escape local optima points. This paper investigates the performance of TS and considers the effects of the size of the Tabu list and the size of the neighbourhood for a procedural content generation.
arXiv Detail & Related papers (2020-06-04T09:15:46Z)
Single-Agent Optimization Through Policy Iteration Using Monte-Carlo Tree Search [8.22379888383833]
Combination of Monte-Carlo Tree Search (MCTS) and deep reinforcement learning is state-of-the-art in two-player perfect-information games. We describe a search algorithm that uses a variant of MCTS which we enhanced by 1) a novel action value normalization mechanism for games with potentially unbounded rewards, 2) defining a virtual loss function that enables effective search parallelization, and 3) a policy network, trained by generations of self-play, to guide the search.
arXiv Detail & Related papers (2020-05-22T18:02:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.