Width-based Lookaheads with Learnt Base Policies and Heuristics Over the
Atari-2600 Benchmark
- URL: http://arxiv.org/abs/2106.12151v1
- Date: Wed, 23 Jun 2021 04:27:55 GMT
- Title: Width-based Lookaheads with Learnt Base Policies and Heuristics Over the
Atari-2600 Benchmark
- Authors: Stefan O'Toole, Nir Lipovetzky, Miquel Ramirez, Adrian Pearce
- Abstract summary: We show that RIW$_C$+CPV outperforms $pi$-IW, $pi$-IW(1)+ and $pi$-HIW(n, 1).
We also present a taxonomy of the set of Atari- 2600 games according to some of their defining characteristics.
- Score: 4.559353193715442
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose new width-based planning and learning algorithms applied over the
Atari-2600 benchmark. The algorithms presented are inspired from a careful
analysis of the design decisions made by previous width-based planners. We
benchmark our new algorithms over the Atari-2600 games and show that our best
performing algorithm, RIW$_C$+CPV, outperforms previously introduced
width-based planning and learning algorithms $\pi$-IW(1), $\pi$-IW(1)+ and
$\pi$-HIW(n, 1). Furthermore, we present a taxonomy of the set of Atari-2600
games according to some of their defining characteristics. This analysis of the
games provides further insight into the behaviour and performance of the
width-based algorithms introduced. Namely, for games with large branching
factors, and games with sparse meaningful rewards, RIW$_C$+CPV outperforms
$\pi$-IW, $\pi$-IW(1)+ and $\pi$-HIW(n, 1).
Related papers
- Representation Learning for General-sum Low-rank Markov Games [63.119870889883224]
We study multi-agent general-sum Markov games with nonlinear function approximation.
We focus on low-rank Markov games whose transition matrix admits a hidden low-rank structure on top of an unknown non-linear representation.
arXiv Detail & Related papers (2022-10-30T22:58:22Z) - Recursive Reasoning in Minimax Games: A Level $k$ Gradient Play Method [0.0]
generative adversarial networks (GANs) are notoriously challenging to train.
We propose a novel reasoning: Level $k$ Play (Lvv.k GP)
In contrast to many existing algorithms, our algorithm does not require sophisticateds or curvature information.
We achieve an FID of 10.17 for unconditional image generation within 30 hours, allowing GAN training on common computational resources to reach state-of-the-art performance.
arXiv Detail & Related papers (2022-10-29T03:43:59Z) - Policy Optimization for Markov Games: Unified Framework and Faster
Convergence [81.3266426402464]
We show that the state-wise average policy of this algorithm converges to an approximate Nash equilibrium (NE) of the game.
We extend this algorithm to multi-player general Markov Games and show a $mathcalwidetildeO(T-1/2)$ convergence rate to Correlated Equilibria (CCE)
arXiv Detail & Related papers (2022-06-06T14:23:13Z) - Efficient $\Phi$-Regret Minimization in Extensive-Form Games via Online
Mirror Descent [49.93548413166884]
$Phi$-Hedge is a generic algorithm capable of learning a large class of equilibria for Normal-Form Games (NFGs)
We show that $Phi$-Hedge can be directly used to learn Nash Equilibria (zero-sum settings), Normal-Form Coarse Correlated Equilibria (NFCCE), and Extensive-Form Correlated Equilibria (EFCE) in EFGs.
We prove that, in those settings, the emph$Phi$-Hedge algorithms are equivalent to standard Mirror Descent (OMD) algorithms for
arXiv Detail & Related papers (2022-05-30T17:58:06Z) - Near-Optimal Learning of Extensive-Form Games with Imperfect Information [54.55092907312749]
We present the first line of algorithms that require only $widetildemathcalO((XA+YB)/varepsilon2)$ episodes of play to find an $varepsilon$-approximate Nash equilibrium in two-player zero-sum games.
This improves upon the best known sample complexity of $widetildemathcalO((X2A+Y2B)/varepsilon2)$ by a factor of $widetildemathcalO(maxX,
arXiv Detail & Related papers (2022-02-03T18:18:28Z) - Hierarchical Width-Based Planning and Learning [8.776765645845012]
Width-based search methods have demonstrated state-of-the-art performance in a wide range of testbeds.
We present a hierarchical algorithm that plans at two levels of abstraction.
We show how in combination with a learned policy and a learned value function, the proposed hierarchical IW can outperform current flat IW-based planners in Atari games with sparse rewards.
arXiv Detail & Related papers (2021-01-15T15:37:46Z) - Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems.
We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z) - A Sharp Analysis of Model-based Reinforcement Learning with Self-Play [49.88233710867315]
We present a sharp analysis of model-based self-play algorithms for multi-agent Markov games.
We design an algorithm -- Optimistic Nash Value Iteration (Nash-VI) for two-player zero-sum Markov games.
We further adapt our analysis to designing a provably efficient task-agnostic algorithm for zero-sum Markov games.
arXiv Detail & Related papers (2020-10-04T15:27:39Z) - Convergence of Deep Fictitious Play for Stochastic Differential Games [6.875312133832078]
Recently proposed machine learning algorithm, deep fictitious play, provides a novel efficient tool for finding Markovian Nash equilibrium of large $N$ asymmetric differential games.
By incorporating the idea of fictitious play, the algorithm decouples the game into $N$ sub-optimization problems.
We show that the strategy based on DFP forms an $eps$-Nash equilibrium.
arXiv Detail & Related papers (2020-08-12T18:27:13Z) - Offline Grid-Based Coverage path planning for guards in games [0.0]
We introduce a novel algorithm for covering a 2D polygonal (with holes) area.
Experimental analysis over several scenarios ranging from simple layouts to more complex maps used in actual games show good performance.
arXiv Detail & Related papers (2020-01-15T18:28:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.