Related papers: Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning

Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2209.14344v3
Date: Sat, 14 Oct 2023 12:36:27 GMT
Title: Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning
Authors: Filippos Christianos, Georgios Papoudakis, Stefano V. Albrecht
Abstract summary: This work focuses on equilibrium selection in no-conflict multi-agent games. Pareto Actor-Critic (Pareto-AC) is an actor-critic algorithm that maximises the returns of all agents.
Score: 18.20664209675016
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work focuses on equilibrium selection in no-conflict multi-agent games, where we specifically study the problem of selecting a Pareto-optimal Nash equilibrium among several existing equilibria. It has been shown that many state-of-the-art multi-agent reinforcement learning (MARL) algorithms are prone to converging to Pareto-dominated equilibria due to the uncertainty each agent has about the policy of the other agents during training. To address sub-optimal equilibrium selection, we propose Pareto Actor-Critic (Pareto-AC), which is an actor-critic algorithm that utilises a simple property of no-conflict games (a superset of cooperative games): the Pareto-optimal equilibrium in a no-conflict game maximises the returns of all agents and, therefore, is the preferred outcome for all agents. We evaluate Pareto-AC in a diverse set of multi-agent games and show that it converges to higher episodic returns compared to seven state-of-the-art MARL algorithms and that it successfully converges to a Pareto-optimal equilibrium in a range of matrix games. Finally, we propose PACDCG, a graph neural network extension of Pareto-AC, which is shown to efficiently scale in games with a large number of agents.

Related papers

Enhancing Multi-Agent Collaboration with Attention-Based Actor-Critic Policies [0.0]
Team-Attention-Actor-Critic (TAAC) is a learning algorithm designed to enhance multi-agent collaboration in cooperative environments.<n>We evaluate TAAC in a simulated soccer environment against benchmark algorithms.
arXiv Detail & Related papers (2025-07-30T15:48:38Z)
From Debate to Equilibrium: Belief-Driven Multi-Agent LLM Reasoning via Bayesian Nash Equilibrium [52.28048367430481]
Multi-agent frameworks can boost the reasoning power of large language models (LLMs), but they typically incur heavy computational costs and lack convergence guarantees.<n>We recast multi-LLM coordination as an incomplete-information game and seek a Bayesian Nash equilibrium (BNE)<n>We introduce Efficient Coordination via Nash Equilibrium (ECON), a hierarchical reinforcement-learning paradigm that marries distributed reasoning with centralized final output.
arXiv Detail & Related papers (2025-06-09T23:49:14Z)
Vairiational Stochastic Games [1.6703448188585752]
We propose a novel variational inference framework tailored to decentralized multi-agent systems. Our framework addresses the challenges posed by non-stationarity and unaligned agent objectives. We demonstrate theoretical convergence guarantees for the proposed decentralized algorithms.
arXiv Detail & Related papers (2025-03-08T03:21:23Z)
Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games [40.05960121330012]
Multi-agent reinforcement learning (MARL) lies at the heart of a plethora of applications involving the interaction of a group of agents in a shared unknown environment. We propose a novel model-based algorithm, called VMG, that incentivizes exploration via biasing the empirical estimate of the model parameters.
arXiv Detail & Related papers (2025-02-13T21:28:51Z)
Principal-Agent Reinforcement Learning: Orchestrating AI Agents with Contracts [20.8288955218712]
We propose a framework where a principal guides an agent in a Markov Decision Process (MDP) using a series of contracts. We present and analyze a meta-algorithm that iteratively optimize the policies of the principal and agent. We then scale our algorithm with deep Q-learning and analyze its convergence in the presence of approximation error.
arXiv Detail & Related papers (2024-07-25T14:28:58Z)
Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property [89.96815099996132]
We develop a new framework to characterize optimistic policy gradient methods in multi-player games with a single controller. Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games.
arXiv Detail & Related papers (2023-12-19T11:34:10Z)
Cooperation Dynamics in Multi-Agent Systems: Exploring Game-Theoretic Scenarios with Mean-Field Equilibria [0.0]
This paper investigates strategies to invoke cooperation in game-theoretic scenarios, namely the Iterated Prisoner's Dilemma. Existing cooperative strategies are analyzed for their effectiveness in promoting group-oriented behavior in repeated games. The study extends to scenarios with exponentially growing agent populations.
arXiv Detail & Related papers (2023-09-28T08:57:01Z)
A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning [53.83345471268163]
We investigate learning the equilibria in non-stationary multi-agent systems. We show how to test for various types of equilibria by a black-box reduction to single-agent learning.
arXiv Detail & Related papers (2023-06-12T23:48:24Z)
On the Complexity of Multi-Agent Decision Making: From Learning in Games to Partial Monitoring [105.13668993076801]
A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees. We study this question in a general framework for interactive decision making with multiple agents. We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making.
arXiv Detail & Related papers (2023-05-01T06:46:22Z)
Inducing Stackelberg Equilibrium through Spatio-Temporal Sequential Decision-Making in Multi-Agent Reinforcement Learning [17.101534531286298]
We construct a Nash-level policy model based on a conditional hypernetwork shared by all agents. This approach allows for asymmetric training with symmetric execution, with each agent responding optimally conditioned on the decisions made by superior agents. Experiments demonstrate that our method effectively converges to the SE policies in repeated matrix game scenarios.
arXiv Detail & Related papers (2023-04-20T14:47:54Z)
Game-Theoretical Perspectives on Active Equilibria: A Preferred Solution Concept over Nash Equilibria [61.093297204685264]
An effective approach in multiagent reinforcement learning is to consider the learning process of agents and influence their future policies. This new solution concept is general such that standard solution concepts, such as a Nash equilibrium, are special cases of active equilibria. We analyze active equilibria from a game-theoretic perspective by closely studying examples where Nash equilibria are known.
arXiv Detail & Related papers (2022-10-28T14:45:39Z)
Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games. We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method. Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z)
Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions. We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario. Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z)
Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers [21.462231105582347]
We propose an algorithm for training agents in n-player, general-sum extensive form games, which provably converges to an equilibrium. We also suggest correlated equilibria (CE) as promising meta-solvers, and propose a novel solution concept Gini Correlated Equilibrium (MGCE) We conduct several experiments using CE meta-solvers for JPSRO and demonstrate convergence on n-player, general-sum games.
arXiv Detail & Related papers (2021-06-17T12:34:18Z)
Calibration of Shared Equilibria in General Sum Partially Observable Markov Games [15.572157454411533]
We consider a general sum partially observable Markov game where agents of different types share a single policy network. This paper aims at i) formally understanding equilibria reached by such agents, and ii) matching emergent phenomena of such equilibria to real-world targets.
arXiv Detail & Related papers (2020-06-23T15:14:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.