Online Double Oracle
- URL: http://arxiv.org/abs/2103.07780v2
- Date: Tue, 16 Mar 2021 14:34:47 GMT
- Title: Online Double Oracle
- Authors: Le Cong Dinh, Yaodong Yang, Zheng Tian, Nicolas Perez Nieves, Oliver
Slumbers, David Henry Mguni, Haitham Bou Ammar, Jun Wang
- Abstract summary: This paper proposes new learning algorithms in two-player zero-sum games where the number of pure strategies is huge or infinite.
Our method achieves the regret bound of $mathcalO(sqrtT k log(k))$ in self-play setting where $k$ is NOT the size of the game.
- Score: 20.291016382324777
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Solving strategic games whose action space is prohibitively large is a
critical yet under-explored topic in economics, computer science and artificial
intelligence. This paper proposes new learning algorithms in two-player
zero-sum games where the number of pure strategies is huge or even infinite.
Specifically, we combine no-regret analysis from online learning with double
oracle methods from game theory. Our method -- \emph{Online Double Oracle
(ODO)} -- achieves the regret bound of $\mathcal{O}(\sqrt{T k \log(k)})$ in
self-play setting where $k$ is NOT the size of the game, but rather the size of
\emph{effective strategy set} that is linearly dependent on the support size of
the Nash equilibrium. On tens of different real-world games, including Leduc
Poker that contains $3^{936}$ pure strategies, our methods outperform no-regret
algorithms and double oracle methods by a large margin, both in convergence
rate to Nash equilibrium and average payoff against strategic adversary.
Related papers
- Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms [71.73971094342349]
Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games.
We show that OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix.
We prove that a broad class of algorithms that do not forget the past quickly all suffer the same issue.
arXiv Detail & Related papers (2024-06-15T13:26:17Z) - Is Learning in Games Good for the Learners? [14.781100349601587]
We consider tradeoffs between reward and regret in repeated gameplay between two agents.
We show that any such equilibrium is reachable by a pair of algorithms which maintain their regret guarantees against arbitrary opponents.
We also consider the question of learning reward-optimal strategies via repeated play with a no-regret agent when game is initially unknown.
arXiv Detail & Related papers (2023-05-31T02:10:27Z) - Near-Optimal No-Regret Learning for General Convex Games [121.50979258049135]
We show that regret can be obtained for general convex and compact strategy sets.
Our dynamics are on an instantiation of optimistic follow-the-regularized-bounds over an appropriately emphlifted space.
Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret.
arXiv Detail & Related papers (2022-06-17T12:58:58Z) - Near-Optimal Learning of Extensive-Form Games with Imperfect Information [54.55092907312749]
We present the first line of algorithms that require only $widetildemathcalO((XA+YB)/varepsilon2)$ episodes of play to find an $varepsilon$-approximate Nash equilibrium in two-player zero-sum games.
This improves upon the best known sample complexity of $widetildemathcalO((X2A+Y2B)/varepsilon2)$ by a factor of $widetildemathcalO(maxX,
arXiv Detail & Related papers (2022-02-03T18:18:28Z) - Anytime Optimal PSRO for Two-Player Zero-Sum Games [17.821479538423155]
Policy Space Response Oracles (PSRO) is a reinforcement learning algorithm for games that can handle continuous actions.
AODO is a double oracle algorithm for 2-player zero-sum games that converges to a Nash equilibrium.
We show that our methods achieve far lower exploitability than DO and PSRO and never increase exploitability.
arXiv Detail & Related papers (2022-01-19T16:34:11Z) - Online Markov Decision Processes with Non-oblivious Strategic Adversary [35.25621843666453]
We study a novel setting in Online Markov Decision Processes (OMDPs) where the loss function is chosen by a non-oblivious strategic adversary.
We first demonstrate that MDP-Expert, an existing algorithm that works well with oblivious adversaries can still apply and achieve a policy regret bound of $mathcalO(sqrtTlog(L)+tau2sqrt T log(|A|))$ where $L$ is the size of adversary's pure strategy set and $|A|$ denotes the size
arXiv Detail & Related papers (2021-10-07T16:32:37Z) - Near-Optimal Reinforcement Learning with Self-Play [50.29853537456737]
We focus on self-play algorithms which learn the optimal policy by playing against itself without any direct supervision.
We propose an optimistic variant of the emphNash Q-learning algorithm with sample complexity $tildemathcalO(SAB)$, and a new emphNash V-learning algorithm with sample complexity $tildemathcalO(S(A+B))$.
arXiv Detail & Related papers (2020-06-22T05:00:13Z) - Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay.
During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well.
Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.