Playing Markov Games Without Observing Payoffs
- URL: http://arxiv.org/abs/2509.00179v2
- Date: Sun, 07 Sep 2025 07:52:00 GMT
- Title: Playing Markov Games Without Observing Payoffs
- Authors: Daniel Ablin, Alon Cohen,
- Abstract summary: This paper introduces and formalizes a new class of zero-sum symmetric Markov games.<n>We show that even without observing payoffs, a player who knows the transition dynamics can still compete against an adversary.<n>Our work broadens the class of games in which robust learning is possible under severe informational disadvantage.
- Score: 5.120675183010349
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Optimization under uncertainty is a fundamental problem in learning and decision-making, particularly in multi-agent systems. Previously, Feldman, Kalai, and Tennenholtz [2010] demonstrated the ability to efficiently compete in repeated symmetric two-player matrix games without observing payoffs, as long as the opponents actions are observed. In this paper, we introduce and formalize a new class of zero-sum symmetric Markov games, which extends the notion of symmetry from matrix games to the Markovian setting. We show that even without observing payoffs, a player who knows the transition dynamics and observes only the opponents sequence of actions can still compete against an adversary who may have complete knowledge of the game. We formalize three distinct notions of symmetry in this setting and show that, under these conditions, the learning problem can be reduced to an instance of online learning, enabling the player to asymptotically match the return of the opponent despite lacking payoff observations. Our algorithms apply to both matrix and Markov games, and run in polynomial time with respect to the size of the game and the number of episodes. Our work broadens the class of games in which robust learning is possible under severe informational disadvantage and deepens the connection between online learning and adversarial game theory.
Related papers
- Computing Game Symmetries and Equilibria That Respect Them [77.72705755558839]
We study the computational of identifying and using symmetries in games.<n>We find a strong connection between game symmetries and graph automorphisms.<n>We show that finding a Nash equilibrium that respects a given set of symmetries is exactly as hard as Brouwer fixed point and gradient descent problems.
arXiv Detail & Related papers (2025-01-15T16:15:16Z) - Securing Equal Share: A Principled Approach for Learning Multiplayer Symmetric Games [21.168085154982712]
equilibria in multiplayer games are neither unique nor non-exploitable.
This paper takes an initial step towards addressing these challenges by focusing on the natural objective of equal share.
We design a series of efficient algorithms, inspired by no-regret learning, that provably attain approximate equal share across various settings.
arXiv Detail & Related papers (2024-06-06T15:59:17Z) - Optimistic Policy Gradient in Multi-Player Markov Games with a Single
Controller: Convergence Beyond the Minty Property [89.96815099996132]
We develop a new framework to characterize optimistic policy gradient methods in multi-player games with a single controller.
Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games.
arXiv Detail & Related papers (2023-12-19T11:34:10Z) - Is Learning in Games Good for the Learners? [14.781100349601587]
We consider tradeoffs between reward and regret in repeated gameplay between two agents.
We show that any such equilibrium is reachable by a pair of algorithms which maintain their regret guarantees against arbitrary opponents.
We also consider the question of learning reward-optimal strategies via repeated play with a no-regret agent when game is initially unknown.
arXiv Detail & Related papers (2023-05-31T02:10:27Z) - On the Convergence of No-Regret Learning Dynamics in Time-Varying Games [89.96815099996132]
We characterize the convergence of optimistic gradient descent (OGD) in time-varying games.
Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games.
We also provide new insights on dynamic regret guarantees in static games.
arXiv Detail & Related papers (2023-01-26T17:25:45Z) - No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z) - Online Learning in Unknown Markov Games [55.07327246187741]
We study online learning in unknown Markov games.
We show that achieving sublinear regret against the best response in hindsight is statistically hard.
We present an algorithm that achieves a sublinear $tildemathcalO(K2/3)$ regret after $K$ episodes.
arXiv Detail & Related papers (2020-10-28T14:52:15Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z) - Matrix games with bandit feedback [33.637621576707076]
We study a version of the classical zero-sum matrix game with unknown payoff matrix and bandit feedback.
We show that Thompson fails catastrophically in this setting and provide empirical comparison to existing algorithms.
arXiv Detail & Related papers (2020-06-09T09:36:21Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.