Asymptotic Convergence and Performance of Multi-Agent Q-Learning
Dynamics
- URL: http://arxiv.org/abs/2301.09619v1
- Date: Mon, 23 Jan 2023 18:39:11 GMT
- Title: Asymptotic Convergence and Performance of Multi-Agent Q-Learning
Dynamics
- Authors: Aamal Abbas Hussain and Francesco Belardinelli and Georgios Piliouras
- Abstract summary: We study the dynamics of smooth Q-Learning, a popular reinforcement learning algorithm.
We show a sufficient condition on the rate of exploration such that the Q-Learning dynamics is guaranteed to converge to a unique equilibrium in any game.
- Score: 38.5932141555258
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Achieving convergence of multiple learning agents in general $N$-player games
is imperative for the development of safe and reliable machine learning (ML)
algorithms and their application to autonomous systems. Yet it is known that,
outside the bounds of simple two-player games, convergence cannot be taken for
granted.
To make progress in resolving this problem, we study the dynamics of smooth
Q-Learning, a popular reinforcement learning algorithm which quantifies the
tendency for learning agents to explore their state space or exploit their
payoffs. We show a sufficient condition on the rate of exploration such that
the Q-Learning dynamics is guaranteed to converge to a unique equilibrium in
any game. We connect this result to games for which Q-Learning is known to
converge with arbitrary exploration rates, including weighted Potential games
and weighted zero sum polymatrix games.
Finally, we examine the performance of the Q-Learning dynamic as measured by
the Time Averaged Social Welfare, and comparing this with the Social Welfare
achieved by the equilibrium. We provide a sufficient condition whereby the
Q-Learning dynamic will outperform the equilibrium even if the dynamics do not
converge.
Related papers
- Stability of Multi-Agent Learning in Competitive Networks: Delaying the
Onset of Chaos [9.220952628571812]
Behaviour of multi-agent learning in competitive network games is often studied within the context of zero-sum games.
We study the Q-Learning dynamics, a popular model of exploration and exploitation in multi-agent learning.
We find that the stability of Q-Learning is explicitly dependent only on the network connectivity rather than the total number of agents.
arXiv Detail & Related papers (2023-12-19T08:41:06Z) - A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning [53.83345471268163]
We investigate learning the equilibria in non-stationary multi-agent systems.
We show how to test for various types of equilibria by a black-box reduction to single-agent learning.
arXiv Detail & Related papers (2023-06-12T23:48:24Z) - Logit-Q Dynamics for Efficient Learning in Stochastic Teams [1.3927943269211591]
We present a new family of logit-Q dynamics for efficient learning in games.
We show that the logit-Q dynamics presented reach (near) efficient equilibrium in teams with unknown dynamics.
arXiv Detail & Related papers (2023-02-20T07:07:25Z) - On the Convergence of No-Regret Learning Dynamics in Time-Varying Games [89.96815099996132]
We characterize the convergence of optimistic gradient descent (OGD) in time-varying games.
Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games.
We also provide new insights on dynamic regret guarantees in static games.
arXiv Detail & Related papers (2023-01-26T17:25:45Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - Independent and Decentralized Learning in Markov Potential Games [3.8779763612314633]
We focus on the independent and decentralized setting, where players do not have knowledge of the game model and cannot coordinate.
In each stage, players update their estimate of Q-function that evaluates their total contingent payoff based on the realized one-stage reward.
We prove that the policies induced by our learning dynamics converge to the set of stationary Nash equilibria in Markov potential games with probability 1.
arXiv Detail & Related papers (2022-05-29T07:39:09Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Exploration-Exploitation in Multi-Agent Competition: Convergence with
Bounded Rationality [21.94743452608215]
We study smooth Q-learning, a prototypical learning model that captures the balance between game rewards and exploration costs.
We show that Q-learning always converges to the unique quantal-response equilibrium (QRE), the standard solution concept for games under bounded rationality.
arXiv Detail & Related papers (2021-06-24T11:43:38Z) - On Information Asymmetry in Competitive Multi-Agent Reinforcement
Learning: Convergence and Optimality [78.76529463321374]
We study the system of interacting non-cooperative two Q-learning agents.
We show that this information asymmetry can lead to a stable outcome of population learning.
arXiv Detail & Related papers (2020-10-21T11:19:53Z) - Multi-Agent Reinforcement Learning in Cournot Games [6.282068591820945]
We study the interaction of strategic agents in continuous action Cournot games with limited information feedback.
We consider the dynamics of the policy gradient algorithm, which is a widely adopted continuous control reinforcement learning algorithm.
arXiv Detail & Related papers (2020-09-14T06:53:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.