Decentralized Q-Learning in Zero-sum Markov Games
- URL: http://arxiv.org/abs/2106.02748v1
- Date: Fri, 4 Jun 2021 22:42:56 GMT
- Title: Decentralized Q-Learning in Zero-sum Markov Games
- Authors: Muhammed O. Sayin, Kaiqing Zhang, David S. Leslie, Tamer Basar, Asuman
Ozdaglar
- Abstract summary: We study multi-agent reinforcement learning (MARL) in discounted zero-sum Markov games.
We develop for the first time a radically uncoupled Q-learning dynamics that is both rational and convergent.
The key challenge in this decentralized setting is the non-stationarity of the learning environment from an agent's perspective.
- Score: 33.81574774144886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study multi-agent reinforcement learning (MARL) in infinite-horizon
discounted zero-sum Markov games. We focus on the practical but challenging
setting of decentralized MARL, where agents make decisions without coordination
by a centralized controller, but only based on their own payoffs and local
actions executed. The agents need not observe the opponent's actions or
payoffs, possibly being even oblivious to the presence of the opponent, nor be
aware of the zero-sum structure of the underlying game, a setting also referred
to as radically uncoupled in the literature of learning in games. In this
paper, we develop for the first time a radically uncoupled Q-learning dynamics
that is both rational and convergent: the learning dynamics converges to the
best response to the opponent's strategy when the opponent follows an
asymptotically stationary strategy; the value function estimates converge to
the payoffs at a Nash equilibrium when both agents adopt the dynamics. The key
challenge in this decentralized setting is the non-stationarity of the learning
environment from an agent's perspective, since both her own payoffs and the
system evolution depend on the actions of other agents, and each agent adapts
their policies simultaneously and independently. To address this issue, we
develop a two-timescale learning dynamics where each agent updates her local
Q-function and value function estimates concurrently, with the latter happening
at a slower timescale.
Related papers
- Convergence of Decentralized Actor-Critic Algorithm in General-sum Markov Games [3.8779763612314633]
We study the properties of learning algorithms in general-sum Markov games.
In particular, we focus on a decentralized algorithm where each agent adopts an actor-critic learning dynamic.
arXiv Detail & Related papers (2024-09-06T20:49:11Z) - Impact of Decentralized Learning on Player Utilities in Stackelberg Games [57.08270857260131]
In many two-agent systems, each agent learns separately and the rewards of the two agents are not perfectly aligned.
We model these systems as Stackelberg games with decentralized learning and show that standard regret benchmarks result in worst-case linear regret for at least one player.
We develop algorithms to achieve near-optimal $O(T2/3)$ regret for both players with respect to these benchmarks.
arXiv Detail & Related papers (2024-02-29T23:38:28Z) - Uncoupled Learning of Differential Stackelberg Equilibria with Commitments [43.098826226730246]
We present uncoupled'' learning dynamics based on zeroth-order gradient estimators.
We prove that they converge to differential Stackelberg equilibria under the same conditions as previous coupled methods.
We also present an online mechanism by which symmetric learners can negotiate leader-follower roles.
arXiv Detail & Related papers (2023-02-07T12:46:54Z) - Decentralized model-free reinforcement learning in stochastic games with
average-reward objective [1.9852463786440127]
We show that our algorithm achieves both sublinear high probability regret of order $T3/4$ and sublinear expected regret of order $T2/3$.
Our algorithm enjoys a low computational complexity and low memory space requirement compared to the previous works.
arXiv Detail & Related papers (2023-01-13T15:59:53Z) - Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret
Learning in Markov Games [95.10091348976779]
We study decentralized policy learning in Markov games where we control a single agent to play with nonstationary and possibly adversarial opponents.
We propose a new algorithm, underlineDecentralized underlineOptimistic hypeunderlineRpolicy munderlineIrror deunderlineScent (DORIS)
DORIS achieves $sqrtK$-regret in the context of general function approximation, where $K$ is the number of episodes.
arXiv Detail & Related papers (2022-06-03T14:18:05Z) - Independent and Decentralized Learning in Markov Potential Games [3.8779763612314633]
We focus on the independent and decentralized setting, where players do not have knowledge of the game model and cannot coordinate.
In each stage, players update their estimate of Q-function that evaluates their total contingent payoff based on the realized one-stage reward.
We prove that the policies induced by our learning dynamics converge to the set of stationary Nash equilibria in Markov potential games with probability 1.
arXiv Detail & Related papers (2022-05-29T07:39:09Z) - Decentralized Cooperative Multi-Agent Reinforcement Learning with
Exploration [35.75029940279768]
We study multi-agent reinforcement learning in the most basic cooperative setting -- Markov teams.
We propose an algorithm in which each agent independently runs a stage-based V-learning style algorithm.
We show that the agents can learn an $epsilon$-approximate Nash equilibrium policy in at most $proptowidetildeO (1/epsilon4)$ episodes.
arXiv Detail & Related papers (2021-10-12T02:45:12Z) - On Information Asymmetry in Competitive Multi-Agent Reinforcement
Learning: Convergence and Optimality [78.76529463321374]
We study the system of interacting non-cooperative two Q-learning agents.
We show that this information asymmetry can lead to a stable outcome of population learning.
arXiv Detail & Related papers (2020-10-21T11:19:53Z) - Decentralized Reinforcement Learning: Global Decision-Making via Local
Economic Transactions [80.49176924360499]
We establish a framework for directing a society of simple, specialized, self-interested agents to solve sequential decision problems.
We derive a class of decentralized reinforcement learning algorithms.
We demonstrate the potential advantages of a society's inherent modular structure for more efficient transfer learning.
arXiv Detail & Related papers (2020-07-05T16:41:09Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.