On Information Asymmetry in Competitive Multi-Agent Reinforcement
Learning: Convergence and Optimality
- URL: http://arxiv.org/abs/2010.10901v2
- Date: Fri, 22 Jan 2021 22:18:21 GMT
- Title: On Information Asymmetry in Competitive Multi-Agent Reinforcement
Learning: Convergence and Optimality
- Authors: Ezra Tampubolon, Haris Ceribasic, Holger Boche
- Abstract summary: We study the system of interacting non-cooperative two Q-learning agents.
We show that this information asymmetry can lead to a stable outcome of population learning.
- Score: 78.76529463321374
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we study the system of interacting non-cooperative two
Q-learning agents, where one agent has the privilege of observing the other's
actions. We show that this information asymmetry can lead to a stable outcome
of population learning, which generally does not occur in an environment of
general independent learners. The resulting post-learning policies are almost
optimal in the underlying game sense, i.e., they form a Nash equilibrium.
Furthermore, we propose in this work a Q-learning algorithm, requiring
predictive observation of two subsequent opponent's actions, yielding an
optimal strategy given that the latter applies a stationary strategy, and
discuss the existence of the Nash equilibrium in the underlying information
asymmetrical game.
Related papers
- Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques [65.55451717632317]
We study Multi-Agent Reinforcement Learning from Human Feedback (MARLHF), exploring both theoretical foundations and empirical validations.
We define the task as identifying Nash equilibrium from a preference-only offline dataset in general-sum games.
Our findings underscore the multifaceted approach required for MARLHF, paving the way for effective preference-based multi-agent systems.
arXiv Detail & Related papers (2024-09-01T13:14:41Z) - LOQA: Learning with Opponent Q-Learning Awareness [1.1666234644810896]
We introduce Learning with Opponent Q-Learning Awareness (LOQA), a decentralized reinforcement learning algorithm tailored to optimize an agent's individual utility.
LOQA achieves state-of-the-art performance in benchmark scenarios such as the Iterated Prisoner's Dilemma and the Coin Game.
arXiv Detail & Related papers (2024-05-02T06:33:01Z) - Uncoupled Learning of Differential Stackelberg Equilibria with Commitments [43.098826226730246]
We present uncoupled'' learning dynamics based on zeroth-order gradient estimators.
We prove that they converge to differential Stackelberg equilibria under the same conditions as previous coupled methods.
We also present an online mechanism by which symmetric learners can negotiate leader-follower roles.
arXiv Detail & Related papers (2023-02-07T12:46:54Z) - Game-Theoretical Perspectives on Active Equilibria: A Preferred Solution
Concept over Nash Equilibria [61.093297204685264]
An effective approach in multiagent reinforcement learning is to consider the learning process of agents and influence their future policies.
This new solution concept is general such that standard solution concepts, such as a Nash equilibrium, are special cases of active equilibria.
We analyze active equilibria from a game-theoretic perspective by closely studying examples where Nash equilibria are known.
arXiv Detail & Related papers (2022-10-28T14:45:39Z) - Independent and Decentralized Learning in Markov Potential Games [3.8779763612314633]
We focus on the independent and decentralized setting, where players do not have knowledge of the game model and cannot coordinate.
In each stage, players update their estimate of Q-function that evaluates their total contingent payoff based on the realized one-stage reward.
We prove that the policies induced by our learning dynamics converge to the set of stationary Nash equilibria in Markov potential games with probability 1.
arXiv Detail & Related papers (2022-05-29T07:39:09Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games [78.65798135008419]
It remains vastly open how to learn the Stackelberg equilibrium in general-sum games efficiently from samples.
This paper initiates the theoretical study of sample-efficient learning of the Stackelberg equilibrium in two-player turn-based general-sum games.
arXiv Detail & Related papers (2021-02-23T05:11:07Z) - Independent Policy Gradient Methods for Competitive Reinforcement
Learning [62.91197073795261]
We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents.
We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule.
arXiv Detail & Related papers (2021-01-11T23:20:42Z) - Calibration of Shared Equilibria in General Sum Partially Observable
Markov Games [15.572157454411533]
We consider a general sum partially observable Markov game where agents of different types share a single policy network.
This paper aims at i) formally understanding equilibria reached by such agents, and ii) matching emergent phenomena of such equilibria to real-world targets.
arXiv Detail & Related papers (2020-06-23T15:14:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.