A Unified Approach to Reinforcement Learning, Quantal Response
Equilibria, and Two-Player Zero-Sum Games
- URL: http://arxiv.org/abs/2206.05825v4
- Date: Tue, 11 Apr 2023 17:50:16 GMT
- Title: A Unified Approach to Reinforcement Learning, Quantal Response
Equilibria, and Two-Player Zero-Sum Games
- Authors: Samuel Sokota, Ryan D'Orazio, J. Zico Kolter, Nicolas Loizou, Marc
Lanctot, Ioannis Mitliagkas, Noam Brown, Christian Kroer
- Abstract summary: This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gradient algorithm.
Our contribution is demonstrating the virtues of magnetic mirror descent as both an equilibrium solver and as an approach to reinforcement learning in two-player zero-sum games.
- Score: 104.3339905200105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work studies an algorithm, which we call magnetic mirror descent, that
is inspired by mirror descent and the non-Euclidean proximal gradient
algorithm. Our contribution is demonstrating the virtues of magnetic mirror
descent as both an equilibrium solver and as an approach to reinforcement
learning in two-player zero-sum games. These virtues include: 1) Being the
first quantal response equilibria solver to achieve linear convergence for
extensive-form games with first order feedback; 2) Being the first standard
reinforcement learning algorithm to achieve empirically competitive results
with CFR in tabular settings; 3) Achieving favorable performance in 3x3 Dark
Hex and Phantom Tic-Tac-Toe as a self-play deep reinforcement learning
algorithm.
Related papers
- Last-Iterate Convergence of Payoff-Based Independent Learning in Zero-Sum Stochastic Games [31.554420227087043]
We develop learning dynamics that are payoff-based, convergent, rational, and symmetric between the two players.
In the matrix game setting, the results imply a complexity of $O(epsilon-1)$ to find the Nash distribution.
In the game setting, the results also imply a complexity of $O(epsilon-8)$ to find a Nash equilibrium.
arXiv Detail & Related papers (2024-09-02T20:07:25Z) - Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning [50.92957910121088]
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS)
For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium.
We extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample efficient manner.
arXiv Detail & Related papers (2024-04-30T06:48:56Z) - Payoff-based learning with matrix multiplicative weights in quantum
games [35.111876815522116]
We study the problem of learning in quantum games - and other classes of semidefinite games - with scalar, payoff-based feedback.
We introduce a suite of minimal-information matrix multiplicative weights (3MW) methods tailored to different information frameworks.
We show that the 3MW method with deterministic payoff feedback retains the convergence rate of the vanilla, full information MMW algorithm in quantum min-max games.
arXiv Detail & Related papers (2023-11-04T14:56:17Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - Learning Two-Player Mixture Markov Games: Kernel Function Approximation
and Correlated Equilibrium [157.0902680672422]
We consider learning Nash equilibria in two-player zero-sum Markov Games with nonlinear function approximation.
We propose a novel online learning algorithm to find a Nash equilibrium by minimizing the duality gap.
arXiv Detail & Related papers (2022-08-10T14:21:54Z) - Alternating Mirror Descent for Constrained Min-Max Games [44.46086335474311]
We study two-player bilinear zero-sum games with constrained strategy spaces.
We analyze the alternating mirror descent algorithm, in which each player takes turns to take action following the mirror descent algorithm for constrained optimization.
arXiv Detail & Related papers (2022-06-08T20:48:16Z) - No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.