Combining Deep Reinforcement Learning and Search for
Imperfect-Information Games
- URL: http://arxiv.org/abs/2007.13544v2
- Date: Sun, 29 Nov 2020 03:18:13 GMT
- Title: Combining Deep Reinforcement Learning and Search for
Imperfect-Information Games
- Authors: Noam Brown, Anton Bakhtin, Adam Lerer, Qucheng Gong
- Abstract summary: We present ReBeL, a framework for self-play reinforcement learning and search provably converges to a Nash equilibrium in zero-sum games.
We also show ReBeL achieves performance in heads-up no-limit Texas hold'em poker, while using far less domain knowledge than any prior poker AI.
- Score: 30.520629802135574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The combination of deep reinforcement learning and search at both training
and test time is a powerful paradigm that has led to a number of successes in
single-agent settings and perfect-information games, best exemplified by
AlphaZero. However, prior algorithms of this form cannot cope with
imperfect-information games. This paper presents ReBeL, a general framework for
self-play reinforcement learning and search that provably converges to a Nash
equilibrium in any two-player zero-sum game. In the simpler setting of
perfect-information games, ReBeL reduces to an algorithm similar to AlphaZero.
Results in two different imperfect-information games show ReBeL converges to an
approximate Nash equilibrium. We also show ReBeL achieves superhuman
performance in heads-up no-limit Texas hold'em poker, while using far less
domain knowledge than any prior poker AI.
Related papers
- Neural Population Learning beyond Symmetric Zero-sum Games [52.20454809055356]
We introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated (CCE) of the game.
Our work shows that equilibrium convergent population learning can be implemented at scale and in generality.
arXiv Detail & Related papers (2024-01-10T12:56:24Z) - Accelerate Multi-Agent Reinforcement Learning in Zero-Sum Games with
Subgame Curriculum Learning [65.36326734799587]
We present a novel subgame curriculum learning framework for zero-sum games.
It adopts an adaptive initial state distribution by resetting agents to some previously visited states.
We derive a subgame selection metric that approximates the squared distance to NE values.
arXiv Detail & Related papers (2023-10-07T13:09:37Z) - Online Learning and Solving Infinite Games with an ERM Oracle [20.1330044382824]
We propose an algorithm for online binary classification setting that relies solely on ERM oracle calls.
We show that it has finite regret in the realizable setting and sublinearly growing regret in the agnostic setting.
Our algorithms apply to both binary-valued and real-valued games and can be viewed as providing justification for the wide use of double oracle and multiple oracle algorithms in the practice of solving large games.
arXiv Detail & Related papers (2023-07-04T12:51:21Z) - Hardness of Independent Learning and Sparse Equilibrium Computation in
Markov Games [70.19141208203227]
We consider the problem of decentralized multi-agent reinforcement learning in Markov games.
We show that no algorithm attains no-regret in general-sum games when executed independently by all players.
We show that our lower bounds hold even for seemingly easier setting in which all agents are controlled by a centralized algorithm.
arXiv Detail & Related papers (2023-03-22T03:28:12Z) - Near-Optimal Learning of Extensive-Form Games with Imperfect Information [54.55092907312749]
We present the first line of algorithms that require only $widetildemathcalO((XA+YB)/varepsilon2)$ episodes of play to find an $varepsilon$-approximate Nash equilibrium in two-player zero-sum games.
This improves upon the best known sample complexity of $widetildemathcalO((X2A+Y2B)/varepsilon2)$ by a factor of $widetildemathcalO(maxX,
arXiv Detail & Related papers (2022-02-03T18:18:28Z) - No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z) - Towards convergence to Nash equilibria in two-team zero-sum games [17.4461045395989]
Two-team zero-sum games are defined as multi-player games where players are split into two competing sets of agents.
We focus on the solution concept of Nash equilibria (NE)
We show that computing NE for this class of games is $textithard$ for the complexity class $mathrm$.
arXiv Detail & Related papers (2021-11-07T21:15:35Z) - Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov
Games with Perfect Recall [34.73929457960903]
We study the problem of learning a Nash equilibrium (NE) in an imperfect information game (IIG) through self-play.
We provide the Implicit Exploration Online Mirror Descent (IXOMD) algorithm.
IXOMD is a model-free algorithm with a high-probability bound on the convergence rate to the NE of order $1/sqrtT$ where $T$ is the number of played games.
arXiv Detail & Related papers (2021-06-11T09:51:29Z) - DREAM: Deep Regret minimization with Advantage baselines and Model-free
learning [24.273841968933475]
We introduce DREAM, a deep reinforcement learning algorithm that finds optimal strategies in imperfect-information games with multiple agents.
Our primary innovation is an effective algorithm that, in contrast to other regret-based deep learning algorithms, does not require access to a perfect simulator of the game to achieve good performance.
arXiv Detail & Related papers (2020-06-18T10:30:27Z) - From Poincar\'e Recurrence to Convergence in Imperfect Information
Games: Finding Equilibrium via Regularization [49.368421783733815]
We show how adapting the reward can give strong convergence guarantees in monotone games.
We also show how this reward adaptation technique can be leveraged to build algorithms that converge exactly to the Nash equilibrium.
arXiv Detail & Related papers (2020-02-19T21:36:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.