Decentralized model-free reinforcement learning in stochastic games with
average-reward objective
- URL: http://arxiv.org/abs/2301.05630v1
- Date: Fri, 13 Jan 2023 15:59:53 GMT
- Title: Decentralized model-free reinforcement learning in stochastic games with
average-reward objective
- Authors: Romain Cravic, Nicolas Gast, Bruno Gaujal
- Abstract summary: We show that our algorithm achieves both sublinear high probability regret of order $T3/4$ and sublinear expected regret of order $T2/3$.
Our algorithm enjoys a low computational complexity and low memory space requirement compared to the previous works.
- Score: 1.9852463786440127
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose the first model-free algorithm that achieves low regret
performance for decentralized learning in two-player zero-sum tabular
stochastic games with infinite-horizon average-reward objective. In
decentralized learning, the learning agent controls only one player and tries
to achieve low regret performances against an arbitrary opponent. This
contrasts with centralized learning where the agent tries to approximate the
Nash equilibrium by controlling both players. In our infinite-horizon
undiscounted setting, additional structure assumptions is needed to provide
good behaviors of learning processes : here we assume for every strategy of the
opponent, the agent has a way to go from any state to any other. This
assumption is the analogous to the "communicating" assumption in the MDP
setting. We show that our Decentralized Optimistic Nash Q-Learning
(DONQ-learning) algorithm achieves both sublinear high probability regret of
order $T^{3/4}$ and sublinear expected regret of order $T^{2/3}$. Moreover, our
algorithm enjoys a low computational complexity and low memory space
requirement compared to the previous works of (Wei et al. 2017) and
(Jafarnia-Jahromi et al. 2021) in the same setting.
Related papers
- Impact of Decentralized Learning on Player Utilities in Stackelberg Games [57.08270857260131]
In many two-agent systems, each agent learns separately and the rewards of the two agents are not perfectly aligned.
We model these systems as Stackelberg games with decentralized learning and show that standard regret benchmarks result in worst-case linear regret for at least one player.
We develop algorithms to achieve near-optimal $O(T2/3)$ regret for both players with respect to these benchmarks.
arXiv Detail & Related papers (2024-02-29T23:38:28Z) - Representation Learning for General-sum Low-rank Markov Games [63.119870889883224]
We study multi-agent general-sum Markov games with nonlinear function approximation.
We focus on low-rank Markov games whose transition matrix admits a hidden low-rank structure on top of an unknown non-linear representation.
arXiv Detail & Related papers (2022-10-30T22:58:22Z) - Minimax-Optimal Multi-Agent RL in Zero-Sum Markov Games With a
Generative Model [50.38446482252857]
Two-player zero-sum Markov games are arguably the most basic setting in multi-agent reinforcement learning.
We develop a learning algorithm that learns an $varepsilon$-approximate Markov NE policy using $$ widetildeObigg.
We derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities.
arXiv Detail & Related papers (2022-08-22T17:24:55Z) - Provably Efficient Reinforcement Learning in Decentralized General-Sum
Markov Games [5.205867750232226]
This paper addresses the problem of learning an equilibrium efficiently in general-sum Markov games.
We propose an algorithm in which each agent independently runs optimistic V-learning to efficiently explore the unknown environment.
We show that the agents can find an $epsilon$-approximate CCE in at most $widetildeO( H6S A /epsilon2)$ episodes.
arXiv Detail & Related papers (2021-10-12T02:01:22Z) - Towards General Function Approximation in Zero-Sum Markov Games [126.58493169301012]
This paper considers two-player zero-sum finite-horizon Markov games with simultaneous moves.
Provably efficient algorithms for both decoupled and coordinated settings are developed.
arXiv Detail & Related papers (2021-07-30T15:25:13Z) - Decentralized Q-Learning in Zero-sum Markov Games [33.81574774144886]
We study multi-agent reinforcement learning (MARL) in discounted zero-sum Markov games.
We develop for the first time a radically uncoupled Q-learning dynamics that is both rational and convergent.
The key challenge in this decentralized setting is the non-stationarity of the learning environment from an agent's perspective.
arXiv Detail & Related papers (2021-06-04T22:42:56Z) - Model-Free Online Learning in Unknown Sequential Decision Making
Problems and Games [114.90723492840499]
In large two-player zero-sum imperfect-information games, modern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium.
We formalize an online learning setting in which the strategy space is not known to the agent.
We give an efficient algorithm that achieves $O(T3/4)$ regret with high probability for that setting, even when the agent faces an adversarial environment.
arXiv Detail & Related papers (2021-03-08T04:03:24Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.