Pareto Actor-Critic for Equilibrium Selection in Multi-Agent
Reinforcement Learning
- URL: http://arxiv.org/abs/2209.14344v3
- Date: Sat, 14 Oct 2023 12:36:27 GMT
- Title: Pareto Actor-Critic for Equilibrium Selection in Multi-Agent
Reinforcement Learning
- Authors: Filippos Christianos, Georgios Papoudakis, Stefano V. Albrecht
- Abstract summary: This work focuses on equilibrium selection in no-conflict multi-agent games.
Pareto Actor-Critic (Pareto-AC) is an actor-critic algorithm that maximises the returns of all agents.
- Score: 18.20664209675016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work focuses on equilibrium selection in no-conflict multi-agent games,
where we specifically study the problem of selecting a Pareto-optimal Nash
equilibrium among several existing equilibria. It has been shown that many
state-of-the-art multi-agent reinforcement learning (MARL) algorithms are prone
to converging to Pareto-dominated equilibria due to the uncertainty each agent
has about the policy of the other agents during training. To address
sub-optimal equilibrium selection, we propose Pareto Actor-Critic (Pareto-AC),
which is an actor-critic algorithm that utilises a simple property of
no-conflict games (a superset of cooperative games): the Pareto-optimal
equilibrium in a no-conflict game maximises the returns of all agents and,
therefore, is the preferred outcome for all agents. We evaluate Pareto-AC in a
diverse set of multi-agent games and show that it converges to higher episodic
returns compared to seven state-of-the-art MARL algorithms and that it
successfully converges to a Pareto-optimal equilibrium in a range of matrix
games. Finally, we propose PACDCG, a graph neural network extension of
Pareto-AC, which is shown to efficiently scale in games with a large number of
agents.
Related papers
- Principal-Agent Reinforcement Learning: Orchestrating AI Agents with Contracts [20.8288955218712]
We propose a framework where a principal guides an agent in a Markov Decision Process (MDP) using a series of contracts.
We present and analyze a meta-algorithm that iteratively optimize the policies of the principal and agent.
We then scale our algorithm with deep Q-learning and analyze its convergence in the presence of approximation error.
arXiv Detail & Related papers (2024-07-25T14:28:58Z) - Optimistic Policy Gradient in Multi-Player Markov Games with a Single
Controller: Convergence Beyond the Minty Property [89.96815099996132]
We develop a new framework to characterize optimistic policy gradient methods in multi-player games with a single controller.
Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games.
arXiv Detail & Related papers (2023-12-19T11:34:10Z) - Cooperation Dynamics in Multi-Agent Systems: Exploring Game-Theoretic Scenarios with Mean-Field Equilibria [0.0]
This paper investigates strategies to invoke cooperation in game-theoretic scenarios, namely the Iterated Prisoner's Dilemma.
Existing cooperative strategies are analyzed for their effectiveness in promoting group-oriented behavior in repeated games.
The study extends to scenarios with exponentially growing agent populations.
arXiv Detail & Related papers (2023-09-28T08:57:01Z) - A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning [53.83345471268163]
We investigate learning the equilibria in non-stationary multi-agent systems.
We show how to test for various types of equilibria by a black-box reduction to single-agent learning.
arXiv Detail & Related papers (2023-06-12T23:48:24Z) - On the Complexity of Multi-Agent Decision Making: From Learning in Games
to Partial Monitoring [105.13668993076801]
A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees.
We study this question in a general framework for interactive decision making with multiple agents.
We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making.
arXiv Detail & Related papers (2023-05-01T06:46:22Z) - Inducing Stackelberg Equilibrium through Spatio-Temporal Sequential
Decision-Making in Multi-Agent Reinforcement Learning [17.101534531286298]
We construct a Nash-level policy model based on a conditional hypernetwork shared by all agents.
This approach allows for asymmetric training with symmetric execution, with each agent responding optimally conditioned on the decisions made by superior agents.
Experiments demonstrate that our method effectively converges to the SE policies in repeated matrix game scenarios.
arXiv Detail & Related papers (2023-04-20T14:47:54Z) - Game-Theoretical Perspectives on Active Equilibria: A Preferred Solution
Concept over Nash Equilibria [61.093297204685264]
An effective approach in multiagent reinforcement learning is to consider the learning process of agents and influence their future policies.
This new solution concept is general such that standard solution concepts, such as a Nash equilibrium, are special cases of active equilibria.
We analyze active equilibria from a game-theoretic perspective by closely studying examples where Nash equilibria are known.
arXiv Detail & Related papers (2022-10-28T14:45:39Z) - Faster Last-iterate Convergence of Policy Optimization in Zero-Sum
Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games.
We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method.
Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z) - Provably Efficient Fictitious Play Policy Optimization for Zero-Sum
Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions.
We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario.
Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z) - Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers [21.462231105582347]
We propose an algorithm for training agents in n-player, general-sum extensive form games, which provably converges to an equilibrium.
We also suggest correlated equilibria (CE) as promising meta-solvers, and propose a novel solution concept Gini Correlated Equilibrium (MGCE)
We conduct several experiments using CE meta-solvers for JPSRO and demonstrate convergence on n-player, general-sum games.
arXiv Detail & Related papers (2021-06-17T12:34:18Z) - Calibration of Shared Equilibria in General Sum Partially Observable
Markov Games [15.572157454411533]
We consider a general sum partially observable Markov game where agents of different types share a single policy network.
This paper aims at i) formally understanding equilibria reached by such agents, and ii) matching emergent phenomena of such equilibria to real-world targets.
arXiv Detail & Related papers (2020-06-23T15:14:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.