An Independent Learning Algorithm for a Class of Symmetric Stochastic
Games
- URL: http://arxiv.org/abs/2110.04638v1
- Date: Sat, 9 Oct 2021 19:57:21 GMT
- Title: An Independent Learning Algorithm for a Class of Symmetric Stochastic
Games
- Authors: Bora Yongacoglu, G\"urdal Arslan, Serdar Y\"uksel
- Abstract summary: This paper investigates the feasibility of using independent learners to find approximate equilibrium policies in non-episodic, discounted games.
We present an independent learning algorithm that comes with high probability guarantees of approximate equilibrium in this class of games.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In multi-agent reinforcement learning, independent learners are those that do
not access the action selections of other learning agents in the system. This
paper investigates the feasibility of using independent learners to find
approximate equilibrium policies in non-episodic, discounted stochastic games.
We define a property, here called the $\epsilon$-revision paths property, and
prove that a class of games exhibiting symmetry among the players has this
property for any $\epsilon \geq 0$. Building on this result, we present an
independent learning algorithm that comes with high probability guarantees of
approximate equilibrium in this class of games. This guarantee is made assuming
symmetry alone, without additional assumptions such as a zero sum, team, or
potential game structure.
Related papers
- Last-Iterate Convergence of Payoff-Based Independent Learning in Zero-Sum Stochastic Games [31.554420227087043]
We develop learning dynamics that are payoff-based, convergent, rational, and symmetric between the two players.
In the matrix game setting, the results imply a complexity of $O(epsilon-1)$ to find the Nash distribution.
In the game setting, the results also imply a complexity of $O(epsilon-8)$ to find a Nash equilibrium.
arXiv Detail & Related papers (2024-09-02T20:07:25Z) - On Tractable $Φ$-Equilibria in Non-Concave Games [53.212133025684224]
Non-concave games introduce significant game-theoretic and optimization challenges.
We show that when $Phi$ is finite, there exists an efficient uncoupled learning algorithm that converges to the corresponding $Phi$-equilibria.
We also show that Online Gradient Descent can efficiently approximate $Phi$-equilibria in non-trivial regimes.
arXiv Detail & Related papers (2024-03-13T01:51:30Z) - Scalable and Independent Learning of Nash Equilibrium Policies in
$n$-Player Stochastic Games with Unknown Independent Chains [1.0878040851638]
We study games with independent chains and unknown transition matrices.
In this class of games, players control their own internal Markov chains whose transitions do not depend on the states/actions of other players.
We propose a fully decentralized mirror descent algorithm to learn an $epsilon$-NE policy.
arXiv Detail & Related papers (2023-12-04T03:04:09Z) - A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning [53.83345471268163]
We investigate learning the equilibria in non-stationary multi-agent systems.
We show how to test for various types of equilibria by a black-box reduction to single-agent learning.
arXiv Detail & Related papers (2023-06-12T23:48:24Z) - A Finite-Sample Analysis of Payoff-Based Independent Learning in
Zero-Sum Stochastic Games [22.62123576833411]
We study two-player zero-sum games, and propose a form of independent learning dynamics called Doubly Smoothed Best-Response dynamics.
The resulting dynamics are payoff-based, convergent, rational, and symmetric among players.
arXiv Detail & Related papers (2023-03-03T05:01:41Z) - Policy Mirror Ascent for Efficient and Independent Learning in Mean
Field Games [35.86199604587823]
Mean-field games have been used as a theoretical tool to obtain an approximate Nash equilibrium for symmetric and anonymous $N$-player games.
We show that $N$ agents running policy mirror ascent converge to the Nash equilibrium of the regularized game within $widetildemathcalO(varepsilon-2)$ samples.
arXiv Detail & Related papers (2022-12-29T20:25:18Z) - Learning Correlated Equilibria in Mean-Field Games [62.14589406821103]
We develop the concepts of Mean-Field correlated and coarse-correlated equilibria.
We show that they can be efficiently learnt in emphall games, without requiring any additional assumption on the structure of the game.
arXiv Detail & Related papers (2022-08-22T08:31:46Z) - A unified stochastic approximation framework for learning in games [82.74514886461257]
We develop a flexible approximation framework for analyzing the long-run behavior of learning in games (both continuous and finite)
The proposed analysis template incorporates a wide array of popular learning algorithms, including gradient-based methods, exponential/multiplicative weights for learning in finite games, optimistic and bandit variants of the above, etc.
arXiv Detail & Related papers (2022-06-08T14:30:38Z) - Learning Stationary Nash Equilibrium Policies in $n$-Player Stochastic
Games with Independent Chains [2.132096006921048]
We consider a class of $n$-player games in which players have their own internal state/action spaces while they are coupled through their payoff functions.
For this class of games, we first show that finding a stationary Nash (NE) policy without any assumption on the reward functions is interactable.
We develop algorithms based on dual averaging and dual mirror descent, which converge to the set of $epsilon$-NE policies.
arXiv Detail & Related papers (2022-01-28T16:27:21Z) - Independent Policy Gradient Methods for Competitive Reinforcement
Learning [62.91197073795261]
We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents.
We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule.
arXiv Detail & Related papers (2021-01-11T23:20:42Z) - On Information Asymmetry in Competitive Multi-Agent Reinforcement
Learning: Convergence and Optimality [78.76529463321374]
We study the system of interacting non-cooperative two Q-learning agents.
We show that this information asymmetry can lead to a stable outcome of population learning.
arXiv Detail & Related papers (2020-10-21T11:19:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.