Logit-Q Dynamics for Efficient Learning in Stochastic Teams
- URL: http://arxiv.org/abs/2302.09806v2
- Date: Tue, 2 Jan 2024 19:43:52 GMT
- Title: Logit-Q Dynamics for Efficient Learning in Stochastic Teams
- Authors: Muhammed O. Sayin and Onur Unlu
- Abstract summary: We show that the logit-Q dynamics presented reach (near) efficient iteration in teams.
We also show the rationality of the logit-Q dynamics against agents following pure stationary strategies.
The key idea is to approximate the dynamics with a fictional scenario where the Q-function estimates are stationary over finite-length epochs only for analysis.
- Score: 1.8492669447784602
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present two logit-Q learning dynamics combining the classical and
independent log-linear learning updates with an on-policy value iteration
update for efficient learning in stochastic games. We show that the logit-Q
dynamics presented reach (near) efficient equilibrium in stochastic teams. We
quantify a bound on the approximation error. We also show the rationality of
the logit-Q dynamics against agents following pure stationary strategies and
the convergence of the dynamics in stochastic games where the reward functions
induce potential games, yet only a single agent controls the state transitions
beyond stochastic teams. The key idea is to approximate the dynamics with a
fictional scenario where the Q-function estimates are stationary over
finite-length epochs only for analysis. We then couple the dynamics in the main
and fictional scenarios to show that these two scenarios become more and more
similar across epochs due to the vanishing step size.
Related papers
- On the Convergence of No-Regret Learning Dynamics in Time-Varying Games [89.96815099996132]
We characterize the convergence of optimistic gradient descent (OGD) in time-varying games.
Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games.
We also provide new insights on dynamic regret guarantees in static games.
arXiv Detail & Related papers (2023-01-26T17:25:45Z) - Asymptotic Convergence and Performance of Multi-Agent Q-Learning
Dynamics [38.5932141555258]
We study the dynamics of smooth Q-Learning, a popular reinforcement learning algorithm.
We show a sufficient condition on the rate of exploration such that the Q-Learning dynamics is guaranteed to converge to a unique equilibrium in any game.
arXiv Detail & Related papers (2023-01-23T18:39:11Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - A unified stochastic approximation framework for learning in games [82.74514886461257]
We develop a flexible approximation framework for analyzing the long-run behavior of learning in games (both continuous and finite)
The proposed analysis template incorporates a wide array of popular learning algorithms, including gradient-based methods, exponential/multiplicative weights for learning in finite games, optimistic and bandit variants of the above, etc.
arXiv Detail & Related papers (2022-06-08T14:30:38Z) - Independent and Decentralized Learning in Markov Potential Games [3.8779763612314633]
We focus on the independent and decentralized setting, where players do not have knowledge of the game model and cannot coordinate.
In each stage, players update their estimate of Q-function that evaluates their total contingent payoff based on the realized one-stage reward.
We prove that the policies induced by our learning dynamics converge to the set of stationary Nash equilibria in Markov potential games with probability 1.
arXiv Detail & Related papers (2022-05-29T07:39:09Z) - Independent Learning in Stochastic Games [16.505046191280634]
We present the model of games for multi-agent learning in dynamic environments.
We focus on the development of simple and independent learning dynamics for games.
We present our recently proposed simple and independent learning dynamics that guarantee convergence in zero-sum games.
arXiv Detail & Related papers (2021-11-23T09:27:20Z) - Learning in nonatomic games, Part I: Finite action spaces and population
games [22.812059396480656]
We examine the long-run behavior of a wide range of dynamics for learning in nonatomic games, in both discrete and continuous time.
We focus exclusively on games with finite action spaces; nonatomic games with continuous action spaces are treated in detail in Part II of this paper.
arXiv Detail & Related papers (2021-07-04T11:20:45Z) - From Motor Control to Team Play in Simulated Humanoid Football [56.86144022071756]
We train teams of physically simulated humanoid avatars to play football in a realistic virtual environment.
In a sequence of stages, players first learn to control a fully articulated body to perform realistic, human-like movements.
They then acquire mid-level football skills such as dribbling and shooting.
Finally, they develop awareness of others and play as a team, bridging the gap between low-level motor control at a timescale of milliseconds.
arXiv Detail & Related papers (2021-05-25T20:17:10Z) - Simple Uncoupled No-Regret Learning Dynamics for Extensive-Form
Correlated Equilibrium [65.64512759706271]
We study the existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games.
We introduce a notion of trigger regret in extensive-form games, which extends that of internal regret in normal-form games.
We give an efficient no-regret algorithm which guarantees with high probability that trigger regrets grow sublinearly in the number of iterations.
arXiv Detail & Related papers (2021-04-04T02:26:26Z) - Fictitious play in zero-sum stochastic games [1.9143447222638694]
We present a novel variant of fictitious play dynamics combining classical play with Q-learning for games.
We analyze its convergence properties in two-player zero-sum games.
arXiv Detail & Related papers (2020-10-08T19:06:45Z) - Chaos, Extremism and Optimism: Volume Analysis of Learning in Games [55.24050445142637]
We present volume analyses of Multiplicative Weights Updates (MWU) and Optimistic Multiplicative Weights Updates (OMWU) in zero-sum as well as coordination games.
We show that OMWU contracts volume, providing an alternative understanding for its known convergent behavior.
We also prove a no-free-lunch type of theorem, in the sense that when examining coordination games the roles are reversed: OMWU expands volume exponentially fast, whereas MWU contracts.
arXiv Detail & Related papers (2020-05-28T13:47:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.