Exploration-Exploitation in Multi-Agent Competition: Convergence with
Bounded Rationality
- URL: http://arxiv.org/abs/2106.12928v1
- Date: Thu, 24 Jun 2021 11:43:38 GMT
- Title: Exploration-Exploitation in Multi-Agent Competition: Convergence with
Bounded Rationality
- Authors: Stefanos Leonardos, Georgios Piliouras, Kelly Spendlove,
- Abstract summary: We study smooth Q-learning, a prototypical learning model that captures the balance between game rewards and exploration costs.
We show that Q-learning always converges to the unique quantal-response equilibrium (QRE), the standard solution concept for games under bounded rationality.
- Score: 21.94743452608215
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The interplay between exploration and exploitation in competitive multi-agent
learning is still far from being well understood. Motivated by this, we study
smooth Q-learning, a prototypical learning model that explicitly captures the
balance between game rewards and exploration costs. We show that Q-learning
always converges to the unique quantal-response equilibrium (QRE), the standard
solution concept for games under bounded rationality, in weighted zero-sum
polymatrix games with heterogeneous learning agents using positive exploration
rates. Complementing recent results about convergence in weighted potential
games, we show that fast convergence of Q-learning in competitive settings is
obtained regardless of the number of agents and without any need for parameter
fine-tuning. As showcased by our experiments in network zero-sum games, these
theoretical results provide the necessary guarantees for an algorithmic
approach to the currently open problem of equilibrium selection in competitive
multi-agent settings.
Related papers
- Tractable Equilibrium Computation in Markov Games through Risk Aversion [12.980882140751895]
We show that risk-averse quantal response equilibria (RQE) become tractable to compute in all $n$-player matrix and finite-horizon Markov games.
RQE is independent of the underlying game structure and only depends on agents' degree of risk-aversion and bounded rationality.
arXiv Detail & Related papers (2024-06-20T09:53:56Z) - Neural Population Learning beyond Symmetric Zero-sum Games [52.20454809055356]
We introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated (CCE) of the game.
Our work shows that equilibrium convergent population learning can be implemented at scale and in generality.
arXiv Detail & Related papers (2024-01-10T12:56:24Z) - Stability of Multi-Agent Learning in Competitive Networks: Delaying the
Onset of Chaos [9.220952628571812]
Behaviour of multi-agent learning in competitive network games is often studied within the context of zero-sum games.
We study the Q-Learning dynamics, a popular model of exploration and exploitation in multi-agent learning.
We find that the stability of Q-Learning is explicitly dependent only on the network connectivity rather than the total number of agents.
arXiv Detail & Related papers (2023-12-19T08:41:06Z) - A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning [53.83345471268163]
We investigate learning the equilibria in non-stationary multi-agent systems.
We show how to test for various types of equilibria by a black-box reduction to single-agent learning.
arXiv Detail & Related papers (2023-06-12T23:48:24Z) - On the Convergence of No-Regret Learning Dynamics in Time-Varying Games [89.96815099996132]
We characterize the convergence of optimistic gradient descent (OGD) in time-varying games.
Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games.
We also provide new insights on dynamic regret guarantees in static games.
arXiv Detail & Related papers (2023-01-26T17:25:45Z) - Asymptotic Convergence and Performance of Multi-Agent Q-Learning
Dynamics [38.5932141555258]
We study the dynamics of smooth Q-Learning, a popular reinforcement learning algorithm.
We show a sufficient condition on the rate of exploration such that the Q-Learning dynamics is guaranteed to converge to a unique equilibrium in any game.
arXiv Detail & Related papers (2023-01-23T18:39:11Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - Learning Rationalizable Equilibria in Multiplayer Games [38.922957434291554]
Existing algorithms require a number of samples exponential in the number of players to learn rationalizable equilibria under bandit feedback.
This paper develops the first line of efficient algorithms for learning rationalizable Coarse Correlated Equilibria (CCE) and Correlated Equilibria (CE)
Our algorithms incorporate several novel techniques to guarantee rationalizability and no (swap-)regret simultaneously, including a correlated exploration scheme and adaptive learning rates.
arXiv Detail & Related papers (2022-10-20T16:49:00Z) - A unified stochastic approximation framework for learning in games [82.74514886461257]
We develop a flexible approximation framework for analyzing the long-run behavior of learning in games (both continuous and finite)
The proposed analysis template incorporates a wide array of popular learning algorithms, including gradient-based methods, exponential/multiplicative weights for learning in finite games, optimistic and bandit variants of the above, etc.
arXiv Detail & Related papers (2022-06-08T14:30:38Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.