Logit-Q Dynamics for Efficient Learning in Stochastic Teams
- URL: http://arxiv.org/abs/2302.09806v2
- Date: Tue, 2 Jan 2024 19:43:52 GMT
- Title: Logit-Q Dynamics for Efficient Learning in Stochastic Teams
- Authors: Muhammed O. Sayin and Onur Unlu
- Abstract summary: We show that the logit-Q dynamics presented reach (near) efficient iteration in teams.
We also show the rationality of the logit-Q dynamics against agents following pure stationary strategies.
The key idea is to approximate the dynamics with a fictional scenario where the Q-function estimates are stationary over finite-length epochs only for analysis.
- Score: 1.8492669447784602
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present two logit-Q learning dynamics combining the classical and
independent log-linear learning updates with an on-policy value iteration
update for efficient learning in stochastic games. We show that the logit-Q
dynamics presented reach (near) efficient equilibrium in stochastic teams. We
quantify a bound on the approximation error. We also show the rationality of
the logit-Q dynamics against agents following pure stationary strategies and
the convergence of the dynamics in stochastic games where the reward functions
induce potential games, yet only a single agent controls the state transitions
beyond stochastic teams. The key idea is to approximate the dynamics with a
fictional scenario where the Q-function estimates are stationary over
finite-length epochs only for analysis. We then couple the dynamics in the main
and fictional scenarios to show that these two scenarios become more and more
similar across epochs due to the vanishing step size.
Related papers
- TimeGraphs: Graph-based Temporal Reasoning [64.18083371645956]
TimeGraphs is a novel approach that characterizes dynamic interactions as a hierarchical temporal graph.
Our approach models the interactions using a compact graph-based representation, enabling adaptive reasoning across diverse time scales.
We evaluate TimeGraphs on multiple datasets with complex, dynamic agent interactions, including a football simulator, the Resistance game, and the MOMA human activity dataset.
arXiv Detail & Related papers (2024-01-06T06:26:49Z) - Loss Dynamics of Temporal Difference Reinforcement Learning [36.772501199987076]
We study the case learning curves for temporal difference learning of a value function with linear function approximators.
We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function.
arXiv Detail & Related papers (2023-07-10T18:17:50Z) - On the Convergence of No-Regret Learning Dynamics in Time-Varying Games [89.96815099996132]
We characterize the convergence of optimistic gradient descent (OGD) in time-varying games.
Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games.
We also provide new insights on dynamic regret guarantees in static games.
arXiv Detail & Related papers (2023-01-26T17:25:45Z) - Asymptotic Convergence and Performance of Multi-Agent Q-Learning
Dynamics [38.5932141555258]
We study the dynamics of smooth Q-Learning, a popular reinforcement learning algorithm.
We show a sufficient condition on the rate of exploration such that the Q-Learning dynamics is guaranteed to converge to a unique equilibrium in any game.
arXiv Detail & Related papers (2023-01-23T18:39:11Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - Independent and Decentralized Learning in Markov Potential Games [3.8779763612314633]
We focus on the independent and decentralized setting, where players do not have knowledge of the game model and cannot coordinate.
In each stage, players update their estimate of Q-function that evaluates their total contingent payoff based on the realized one-stage reward.
We prove that the policies induced by our learning dynamics converge to the set of stationary Nash equilibria in Markov potential games with probability 1.
arXiv Detail & Related papers (2022-05-29T07:39:09Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Independent Learning in Stochastic Games [16.505046191280634]
We present the model of games for multi-agent learning in dynamic environments.
We focus on the development of simple and independent learning dynamics for games.
We present our recently proposed simple and independent learning dynamics that guarantee convergence in zero-sum games.
arXiv Detail & Related papers (2021-11-23T09:27:20Z) - Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z) - Fictitious play in zero-sum stochastic games [1.9143447222638694]
We present a novel variant of fictitious play dynamics combining classical play with Q-learning for games.
We analyze its convergence properties in two-player zero-sum games.
arXiv Detail & Related papers (2020-10-08T19:06:45Z) - Chaos, Extremism and Optimism: Volume Analysis of Learning in Games [55.24050445142637]
We present volume analyses of Multiplicative Weights Updates (MWU) and Optimistic Multiplicative Weights Updates (OMWU) in zero-sum as well as coordination games.
We show that OMWU contracts volume, providing an alternative understanding for its known convergent behavior.
We also prove a no-free-lunch type of theorem, in the sense that when examining coordination games the roles are reversed: OMWU expands volume exponentially fast, whereas MWU contracts.
arXiv Detail & Related papers (2020-05-28T13:47:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.