Fictitious play in zero-sum stochastic games
- URL: http://arxiv.org/abs/2010.04223v6
- Date: Thu, 2 Jun 2022 09:03:25 GMT
- Title: Fictitious play in zero-sum stochastic games
- Authors: Muhammed O. Sayin, Francesca Parise and Asuman Ozdaglar
- Abstract summary: We present a novel variant of fictitious play dynamics combining classical play with Q-learning for games.
We analyze its convergence properties in two-player zero-sum games.
- Score: 1.9143447222638694
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel variant of fictitious play dynamics combining classical
fictitious play with Q-learning for stochastic games and analyze its
convergence properties in two-player zero-sum stochastic games. Our dynamics
involves players forming beliefs on the opponent strategy and their own
continuation payoff (Q-function), and playing a greedy best response by using
the estimated continuation payoffs. Players update their beliefs from
observations of opponent actions. A key property of the learning dynamics is
that update of the beliefs on Q-functions occurs at a slower timescale than
update of the beliefs on strategies. We show both in the model-based and
model-free cases (without knowledge of player payoff functions and state
transition probabilities), the beliefs on strategies converge to a stationary
mixed Nash equilibrium of the zero-sum stochastic game.
Related papers
- Finite-Sample Guarantees for Best-Response Learning Dynamics in Zero-Sum Matrix Games [22.380293155135096]
We study best-response type learning dynamics for two player zero-sum matrix games.
We consider two settings that are distinguished by the type of information that each player has about the game and their opponent's strategy.
arXiv Detail & Related papers (2024-07-29T15:56:49Z) - Optimistic Policy Gradient in Multi-Player Markov Games with a Single
Controller: Convergence Beyond the Minty Property [89.96815099996132]
We develop a new framework to characterize optimistic policy gradient methods in multi-player games with a single controller.
Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games.
arXiv Detail & Related papers (2023-12-19T11:34:10Z) - Logit-Q Dynamics for Efficient Learning in Stochastic Teams [1.3927943269211591]
We present a new family of logit-Q dynamics for efficient learning in games.
We show that the logit-Q dynamics presented reach (near) efficient equilibrium in teams with unknown dynamics.
arXiv Detail & Related papers (2023-02-20T07:07:25Z) - On the Convergence of No-Regret Learning Dynamics in Time-Varying Games [89.96815099996132]
We characterize the convergence of optimistic gradient descent (OGD) in time-varying games.
Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games.
We also provide new insights on dynamic regret guarantees in static games.
arXiv Detail & Related papers (2023-01-26T17:25:45Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - Nash Equilibria and Pitfalls of Adversarial Training in Adversarial
Robustness Games [51.90475640044073]
We study adversarial training as an alternating best-response strategy in a 2-player zero-sum game.
On the other hand, a unique pure Nash equilibrium of the game exists and is provably robust.
arXiv Detail & Related papers (2022-10-23T03:21:01Z) - Independent and Decentralized Learning in Markov Potential Games [3.8779763612314633]
We focus on the independent and decentralized setting, where players do not have knowledge of the game model and cannot coordinate.
In each stage, players update their estimate of Q-function that evaluates their total contingent payoff based on the realized one-stage reward.
We prove that the policies induced by our learning dynamics converge to the set of stationary Nash equilibria in Markov potential games with probability 1.
arXiv Detail & Related papers (2022-05-29T07:39:09Z) - Reinforcement Learning In Two Player Zero Sum Simultaneous Action Games [0.0]
Two player zero sum simultaneous action games are common in video games, financial markets, war, business competition, and many other settings.
We introduce the fundamental concepts of reinforcement learning in two player zero sum simultaneous action games and discuss the unique challenges this type of game poses.
We introduce two novel agents that attempt to handle these challenges by using joint action Deep Q-Networks.
arXiv Detail & Related papers (2021-10-10T16:03:44Z) - Computing Nash Equilibria in Multiplayer DAG-Structured Stochastic Games
with Persistent Imperfect Information [1.7132914341329848]
We present an algorithm for approximating Nash equilibrium in multiplayer general-suming games with persistent imperfect information.
Using a new procedure, we are able to demonstrate that our algorithm computes a strategy that closely approximates Nash equilibrium in this game.
arXiv Detail & Related papers (2020-10-26T19:27:26Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z) - Chaos, Extremism and Optimism: Volume Analysis of Learning in Games [55.24050445142637]
We present volume analyses of Multiplicative Weights Updates (MWU) and Optimistic Multiplicative Weights Updates (OMWU) in zero-sum as well as coordination games.
We show that OMWU contracts volume, providing an alternative understanding for its known convergent behavior.
We also prove a no-free-lunch type of theorem, in the sense that when examining coordination games the roles are reversed: OMWU expands volume exponentially fast, whereas MWU contracts.
arXiv Detail & Related papers (2020-05-28T13:47:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.