Related papers: Logit-Q Dynamics for Efficient Learning in Stochastic Teams

Logit-Q Dynamics for Efficient Learning in Stochastic Teams

URL: http://arxiv.org/abs/2302.09806v3
Date: Wed, 02 Oct 2024 08:02:46 GMT
Title: Logit-Q Dynamics for Efficient Learning in Stochastic Teams
Authors: Ahmed Said Donmez, Onur Unlu, Muhammed O. Sayin,
Abstract summary: We present a new family of logit-Q dynamics for efficient learning in games. We show that the logit-Q dynamics presented reach (near) efficient equilibrium in teams with unknown dynamics.
Score: 1.3927943269211591
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a new family of logit-Q dynamics for efficient learning in stochastic games by combining the log-linear learning (also known as logit dynamics) for the repeated play of normal-form games with Q-learning for unknown Markov decision processes within the auxiliary stage-game framework. In this framework, we view stochastic games as agents repeatedly playing some stage game associated with the current state of the underlying game while the agents' Q-functions determine the payoffs of these stage games. We show that the logit-Q dynamics presented reach (near) efficient equilibrium in stochastic teams with unknown dynamics and quantify the approximation error. We also show the rationality of the logit-Q dynamics against agents following pure stationary strategies and the convergence of the dynamics in stochastic games where the stage-payoffs induce potential games, yet only a single agent controls the state transitions beyond stochastic teams. The key idea is to approximate the dynamics with a fictional scenario where the Q-function estimates are stationary over epochs whose lengths grow at a sufficiently slow rate. We then couple the dynamics in the main and fictional scenarios to show that these two scenarios become more and more similar across epochs due to the vanishing step size and growing epoch lengths.

Related papers

On the Convergence of No-Regret Learning Dynamics in Time-Varying Games [89.96815099996132]
We characterize the convergence of optimistic gradient descent (OGD) in time-varying games. Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games. We also provide new insights on dynamic regret guarantees in static games.
arXiv Detail & Related papers (2023-01-26T17:25:45Z)
Asymptotic Convergence and Performance of Multi-Agent Q-Learning Dynamics [38.5932141555258]
We study the dynamics of smooth Q-Learning, a popular reinforcement learning algorithm. We show a sufficient condition on the rate of exploration such that the Q-Learning dynamics is guaranteed to converge to a unique equilibrium in any game.
arXiv Detail & Related papers (2023-01-23T18:39:11Z)
Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients. We model players' strategies using artificial neural networks. This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z)
A unified stochastic approximation framework for learning in games [82.74514886461257]
We develop a flexible approximation framework for analyzing the long-run behavior of learning in games (both continuous and finite) The proposed analysis template incorporates a wide array of popular learning algorithms, including gradient-based methods, exponential/multiplicative weights for learning in finite games, optimistic and bandit variants of the above, etc.
arXiv Detail & Related papers (2022-06-08T14:30:38Z)
Independent and Decentralized Learning in Markov Potential Games [3.8779763612314633]
We focus on the independent and decentralized setting, where players do not have knowledge of the game model and cannot coordinate. In each stage, players update their estimate of Q-function that evaluates their total contingent payoff based on the realized one-stage reward. We prove that the policies induced by our learning dynamics converge to the set of stationary Nash equilibria in Markov potential games with probability 1.
arXiv Detail & Related papers (2022-05-29T07:39:09Z)
Independent Learning in Stochastic Games [16.505046191280634]
We present the model of games for multi-agent learning in dynamic environments. We focus on the development of simple and independent learning dynamics for games. We present our recently proposed simple and independent learning dynamics that guarantee convergence in zero-sum games.
arXiv Detail & Related papers (2021-11-23T09:27:20Z)
Learning in nonatomic games, Part I: Finite action spaces and population games [22.812059396480656]
We examine the long-run behavior of a wide range of dynamics for learning in nonatomic games, in both discrete and continuous time. We focus exclusively on games with finite action spaces; nonatomic games with continuous action spaces are treated in detail in Part II of this paper.
arXiv Detail & Related papers (2021-07-04T11:20:45Z)
From Motor Control to Team Play in Simulated Humanoid Football [56.86144022071756]
We train teams of physically simulated humanoid avatars to play football in a realistic virtual environment. In a sequence of stages, players first learn to control a fully articulated body to perform realistic, human-like movements. They then acquire mid-level football skills such as dribbling and shooting. Finally, they develop awareness of others and play as a team, bridging the gap between low-level motor control at a timescale of milliseconds.
arXiv Detail & Related papers (2021-05-25T20:17:10Z)
Simple Uncoupled No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium [65.64512759706271]
We study the existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games. We introduce a notion of trigger regret in extensive-form games, which extends that of internal regret in normal-form games. We give an efficient no-regret algorithm which guarantees with high probability that trigger regrets grow sublinearly in the number of iterations.
arXiv Detail & Related papers (2021-04-04T02:26:26Z)
Fictitious play in zero-sum stochastic games [1.9143447222638694]
We present a novel variant of fictitious play dynamics combining classical play with Q-learning for games. We analyze its convergence properties in two-player zero-sum games.
arXiv Detail & Related papers (2020-10-08T19:06:45Z)
Chaos, Extremism and Optimism: Volume Analysis of Learning in Games [55.24050445142637]
We present volume analyses of Multiplicative Weights Updates (MWU) and Optimistic Multiplicative Weights Updates (OMWU) in zero-sum as well as coordination games. We show that OMWU contracts volume, providing an alternative understanding for its known convergent behavior. We also prove a no-free-lunch type of theorem, in the sense that when examining coordination games the roles are reversed: OMWU expands volume exponentially fast, whereas MWU contracts.
arXiv Detail & Related papers (2020-05-28T13:47:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.