Related papers: Aggregate Fictitious Play for Learning in Anonymous Polymatrix Games (Extended Version)

Aggregate Fictitious Play for Learning in Anonymous Polymatrix Games (Extended Version)

URL: http://arxiv.org/abs/2508.19371v1
Date: Tue, 26 Aug 2025 19:04:58 GMT
Title: Aggregate Fictitious Play for Learning in Anonymous Polymatrix Games (Extended Version)
Authors: Semih Kara, Tamer Başar,
Abstract summary: Fictitious play (FP) is an algorithm that enables agents to learn Nash equilibrium in games with certain reward structures.<n>We introduce aggregate fictitious play (agg-FP), a variant of FP where each agent tracks the frequency of the number of other agents playing each action.<n>We show that in anonymous polymatrix games, agg-FP converges to a Nash equilibrium under the same conditions as classical FP.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fictitious play (FP) is a well-studied algorithm that enables agents to learn Nash equilibrium in games with certain reward structures. However, when agents have no prior knowledge of the reward functions, FP faces a major challenge: the joint action space grows exponentially with the number of agents, which slows down reward exploration. Anonymous games offer a structure that mitigates this issue. In these games, the rewards depend only on the actions taken; not on who is taking which action. Under such a structure, we introduce aggregate fictitious play (agg-FP), a variant of FP where each agent tracks the frequency of the number of other agents playing each action, rather than these agents' individual actions. We show that in anonymous polymatrix games, agg-FP converges to a Nash equilibrium under the same conditions as classical FP. In essence, by aggregating the agents' actions, we reduce the action space without losing the convergence guarantees. Using simulations, we provide empirical evidence on how this reduction accelerates convergence.

Related papers

Toward Optimal LLM Alignments Using Two-Player Games [86.39338084862324]
In this paper, we investigate alignment through the lens of two-agent games, involving iterative interactions between an adversarial and a defensive agent. We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents. Experimental results in safety scenarios demonstrate that learning in such a competitive environment not only fully trains agents but also leads to policies with enhanced generalization capabilities for both adversarial and defensive agents.
arXiv Detail & Related papers (2024-06-16T15:24:50Z)
Linear Convergence of Independent Natural Policy Gradient in Games with Entropy Regularization [12.612009339150504]
This work focuses on the entropy-regularized independent natural policy gradient (NPG) algorithm in multi-agent reinforcement learning. We show that, under sufficient entropy regularization, the dynamics of this system converge at a linear rate to the quantal response equilibrium (QRE)
arXiv Detail & Related papers (2024-05-04T22:48:53Z)
Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents [52.75161794035767]
We introduce a class of bandit algorithms that meet the two objectives of performance incentivization and robustness simultaneously.<n>We show that settings where the principal has no information about the arms' performance characteristics can be handled by combining ideas from second price auctions with our algorithms.
arXiv Detail & Related papers (2023-12-13T06:54:49Z)
Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed Cooperative-Competitive Games [14.979239870856535]
Self-play (SP) is a popular reinforcement learning framework for solving competitive games. In this work, we develop a novel algorithm, Fictitious Cross-Play (FXP), which inherits the benefits from both frameworks.
arXiv Detail & Related papers (2023-10-05T07:19:33Z)
Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions. We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario. Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z)
Off-Beat Multi-Agent Reinforcement Learning [62.833358249873704]
We investigate model-free multi-agent reinforcement learning (MARL) in environments where off-beat actions are prevalent. We propose a novel episodic memory, LeGEM, for model-free MARL algorithms. We evaluate LeGEM on various multi-agent scenarios with off-beat actions, including Stag-Hunter Game, Quarry Game, Afforestation Game, and StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2022-05-27T02:21:04Z)
On the Convergence of Fictitious Play: A Decomposition Approach [17.607284715519587]
We extend the convergence results of Fictitious Play (FP) to the combinations of such games and beyond. We develop a linear relationship unifying cooperation and competition in the sense that these two classes of games are mutually transferable. We analyze a non-convergent example of FP, the Shapley game, and develop sufficient conditions for FP to converge.
arXiv Detail & Related papers (2022-05-03T13:04:09Z)
Adversarial Online Learning with Variable Plays in the Pursuit-Evasion Game: Theoretical Foundations and Application in Connected and Automated Vehicle Cybersecurity [5.9774834479750805]
We extend the adversarial/non-stochastic multi-play multi-armed bandit (MPMAB) to the case where the number of arms to play is variable. The work is motivated by the fact that the resources allocated to scan different critical locations in an interconnected transportation system change dynamically over time and depending on the environment.
arXiv Detail & Related papers (2021-10-26T23:09:42Z)
Reinforcement Learning In Two Player Zero Sum Simultaneous Action Games [0.0]
Two player zero sum simultaneous action games are common in video games, financial markets, war, business competition, and many other settings. We introduce the fundamental concepts of reinforcement learning in two player zero sum simultaneous action games and discuss the unique challenges this type of game poses. We introduce two novel agents that attempt to handle these challenges by using joint action Deep Q-Networks.
arXiv Detail & Related papers (2021-10-10T16:03:44Z)
Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards. We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences. We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z)
Adversarial Inverse Reinforcement Learning for Mean Field Games [17.392418397388823]
Mean field games (MFGs) provide a mathematically tractable framework for modelling large-scale multi-agent systems. This paper proposes a novel framework, Mean-Field Adversarial IRL (MF-AIRL), which is capable of tackling uncertainties in demonstrations.
arXiv Detail & Related papers (2021-04-29T21:03:49Z)
Scaling up Mean Field Games with Online Mirror Descent [55.36153467919289]
We address scaling up equilibrium computation in Mean Field Games (MFGs) using Online Mirror Descent (OMD) We show that continuous-time OMD provably converges to a Nash equilibrium under a natural and well-motivated set of monotonicity assumptions. A thorough experimental investigation on various single and multi-population MFGs shows that OMD outperforms traditional algorithms such as Fictitious Play (FP)
arXiv Detail & Related papers (2021-02-28T21:28:36Z)
Multi-Agent Collaboration via Reward Attribution Decomposition [75.36911959491228]
We propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge. CollaQ is evaluated on various StarCraft Attribution maps and shows that it outperforms existing state-of-the-art techniques.
arXiv Detail & Related papers (2020-10-16T17:42:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.