Grouped Satisficing Paths in Pure Strategy Games: a Topological Perspective
- URL: http://arxiv.org/abs/2509.23157v1
- Date: Sat, 27 Sep 2025 07:07:27 GMT
- Title: Grouped Satisficing Paths in Pure Strategy Games: a Topological Perspective
- Authors: Yanqing Fu, Chao Huang, Chenrun Wang, Zhuping Wang,
- Abstract summary: A widely adopted principle in MARL algorithms is "win-stay, lose-shift", which dictates that an agent retains its current strategy if it achieves the best response.<n>This paper establishes a sufficient condition for such a property, and demonstrates that any finite-state Markov game, as well as any $N$-player game, guarantees the existence of a finite-length satisficing path.
- Score: 15.76917401735207
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In game theory and multi-agent reinforcement learning (MARL), each agent selects a strategy, interacts with the environment and other agents, and subsequently updates its strategy based on the received payoff. This process generates a sequence of joint strategies $(s^t)_{t \geq 0}$, where $s^t$ represents the strategy profile of all agents at time step $t$. A widely adopted principle in MARL algorithms is "win-stay, lose-shift", which dictates that an agent retains its current strategy if it achieves the best response. This principle exhibits a fixed-point property when the joint strategy has become an equilibrium. The sequence of joint strategies under this principle is referred to as a satisficing path, a concept first introduced in [40] and explored in the context of $N$-player games in [39]. A fundamental question arises regarding this principle: Under what conditions does every initial joint strategy $s$ admit a finite-length satisficing path $(s^t)_{0 \leq t \leq T}$ where $s^0=s$ and $s^T$ is an equilibrium? This paper establishes a sufficient condition for such a property, and demonstrates that any finite-state Markov game, as well as any $N$-player game, guarantees the existence of a finite-length satisficing path from an arbitrary initial strategy to some equilibrium. These results provide a stronger theoretical foundation for the design of MARL algorithms.
Related papers
- Investigating Scale Independent UCT Exploration Factor Strategies [42.138439537056954]
The Upper Confidence Bounds For Trees algorithm is not agnostic to the reward scale of the game it is applied to.<n>In this paper, we evaluate various strategies for adaptively choosing the $lambda$, called $lambda$-strategies, that are agnostic to the game's reward scale.
arXiv Detail & Related papers (2025-10-24T09:19:14Z) - Achieving Logarithmic Regret in KL-Regularized Zero-Sum Markov Games [53.447182734351]
We develop and analyze algorithms that provably achieve improved sample efficiency under Reverse Kullback-Leibler (KL) regularization.<n>We study both two-player zero-sum Matrix games and Markov games: for Matrix games, we propose OMG, an algorithm based on best response sampling with optimistic bonuses, and extend this idea to Markov games through the algorithm SOMG.<n>Both algorithms achieve a logarithmic regret in $T$ that scales inversely with the KL regularization strength $beta$ in addition to the standard $widetildemathcalO(sqrtT)
arXiv Detail & Related papers (2025-10-15T01:00:54Z) - Paths to Equilibrium in Games [6.812247730094933]
We study sequences of strategies satisfying a pairwise constraint inspired by policy updating in reinforcement learning.
Our analysis reveals a counterintuitive insight that reward deteriorating strategic updates are key to driving play to equilibrium along a satisficing path.
arXiv Detail & Related papers (2024-03-26T19:58:39Z) - Hierarchical Strategies for Cooperative Multi-Agent Reinforcement
Learning [0.0]
We propose a two-level hierarchical architecture that combines a novel information-theoretic objective with a trajectory prediction model to learn a strategy.
We show that our method establishes a new state of the art being, to the best of our knowledge, the first MARL algorithm to solve all super hard SC II scenarios.
Videos and brief overview of the methods are available at: https://sites.google.com/view/hier-strats-marl/home.
arXiv Detail & Related papers (2022-12-14T18:27:58Z) - Minimax-Optimal Multi-Agent RL in Zero-Sum Markov Games With a
Generative Model [50.38446482252857]
Two-player zero-sum Markov games are arguably the most basic setting in multi-agent reinforcement learning.
We develop a learning algorithm that learns an $varepsilon$-approximate Markov NE policy using $$ widetildeObigg.
We derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities.
arXiv Detail & Related papers (2022-08-22T17:24:55Z) - Near-Optimal No-Regret Learning for General Convex Games [121.50979258049135]
We show that regret can be obtained for general convex and compact strategy sets.
Our dynamics are on an instantiation of optimistic follow-the-regularized-bounds over an appropriately emphlifted space.
Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret.
arXiv Detail & Related papers (2022-06-17T12:58:58Z) - Policy Optimization for Markov Games: Unified Framework and Faster
Convergence [81.3266426402464]
We show that the state-wise average policy of this algorithm converges to an approximate Nash equilibrium (NE) of the game.
We extend this algorithm to multi-player general Markov Games and show a $mathcalwidetildeO(T-1/2)$ convergence rate to Correlated Equilibria (CCE)
arXiv Detail & Related papers (2022-06-06T14:23:13Z) - Provably Efficient Offline Multi-agent Reinforcement Learning via
Strategy-wise Bonus [48.34563955829649]
We propose a strategy-wise concentration principle which builds a confidence interval for the joint strategy.
For two-player zero-sum Markov games, by exploiting the convexity of the strategy-wise bonus, we propose an efficient algorithm.
All of our algorithms can naturally take a pre-specified strategy class $Pi$ as input and output a strategy close to the best strategy in $Pi$.
arXiv Detail & Related papers (2022-06-01T00:18:15Z) - Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay.
During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well.
Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.