Actor-Dual-Critic Dynamics for Zero-sum and Identical-Interest Stochastic Games
- URL: http://arxiv.org/abs/2602.00606v1
- Date: Sat, 31 Jan 2026 08:48:09 GMT
- Title: Actor-Dual-Critic Dynamics for Zero-sum and Identical-Interest Stochastic Games
- Authors: Ahmed Said Donmez, Yuksel Arslantas, Muhammed O. Sayin,
- Abstract summary: We propose a novel independent and payoff-based learning framework for games that is model-free, game-agnostic, and gradient-free.<n>We establish convergence to (approximate)libria in two-agent zero-sum and multi-agent identical-interest games over an infinite horizon.<n>This provides one of the first payoff-based and fully decentralized learning algorithms with theoretical guarantees in both settings.
- Score: 2.992414059774663
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel independent and payoff-based learning framework for stochastic games that is model-free, game-agnostic, and gradient-free. The learning dynamics follow a best-response-type actor-critic architecture, where agents update their strategies (actors) using feedback from two distinct critics: a fast critic that intuitively responds to observed payoffs under limited information, and a slow critic that deliberatively approximates the solution to the underlying dynamic programming problem. Crucially, the learning process relies on non-equilibrium adaptation through smoothed best responses to observed payoffs. We establish convergence to (approximate) equilibria in two-agent zero-sum and multi-agent identical-interest stochastic games over an infinite horizon. This provides one of the first payoff-based and fully decentralized learning algorithms with theoretical guarantees in both settings. Empirical results further validate the robustness and effectiveness of the proposed approach across both classes of games.
Related papers
- Decoding Rewards in Competitive Games: Inverse Game Theory with Entropy Regularization [52.74762030521324]
We propose a novel algorithm to learn reward functions from observed actions.<n>We provide strong theoretical guarantees for the reliability and sample efficiency of our algorithm.
arXiv Detail & Related papers (2026-01-19T04:12:51Z) - A Minimaximalist Approach to Reinforcement Learning from Human Feedback [49.45285664482369]
We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement learning from human feedback.
Our approach is minimalist in that it does not require training a reward model nor unstable adversarial training.
We demonstrate that on a suite of continuous control tasks, we are able to learn significantly more efficiently than reward-model based approaches.
arXiv Detail & Related papers (2024-01-08T17:55:02Z) - Offline Learning in Markov Games with General Function Approximation [22.2472618685325]
We study offline multi-agent reinforcement learning (RL) in Markov games.
We provide the first framework for sample-efficient offline learning in Markov games.
arXiv Detail & Related papers (2023-02-06T05:22:27Z) - Understanding Self-Predictive Learning for Reinforcement Learning [61.62067048348786]
We study the learning dynamics of self-predictive learning for reinforcement learning.
We propose a novel self-predictive algorithm that learns two representations simultaneously.
arXiv Detail & Related papers (2022-12-06T20:43:37Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - A unified stochastic approximation framework for learning in games [82.74514886461257]
We develop a flexible approximation framework for analyzing the long-run behavior of learning in games (both continuous and finite)
The proposed analysis template incorporates a wide array of popular learning algorithms, including gradient-based methods, exponential/multiplicative weights for learning in finite games, optimistic and bandit variants of the above, etc.
arXiv Detail & Related papers (2022-06-08T14:30:38Z) - Independent and Decentralized Learning in Markov Potential Games [3.549868541921029]
We study a multi-agent reinforcement learning dynamics, and analyze its behavior in infinite-horizon discounted Markov potential games.<n>We focus on the independent and decentralized setting, where players do not know the game parameters, and cannot communicate or coordinate.
arXiv Detail & Related papers (2022-05-29T07:39:09Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Multiplayer Performative Prediction: Learning in Decision-Dependent
Games [18.386569111954213]
This paper formulates a new game theoretic framework for multi-player performative prediction.
We focus on two distinct solution concepts, namely (i) performatively stable equilibria and (ii) Nash equilibria of the game.
We show that under mild assumptions, the performatively stable equilibria can be found efficiently by a variety of algorithms.
arXiv Detail & Related papers (2022-01-10T15:31:10Z) - Decentralized Q-Learning in Zero-sum Markov Games [33.81574774144886]
We study multi-agent reinforcement learning (MARL) in discounted zero-sum Markov games.
We develop for the first time a radically uncoupled Q-learning dynamics that is both rational and convergent.
The key challenge in this decentralized setting is the non-stationarity of the learning environment from an agent's perspective.
arXiv Detail & Related papers (2021-06-04T22:42:56Z) - Hindsight and Sequential Rationality of Correlated Play [18.176128899338433]
We look at algorithms that ensure strong performance in hindsight relative to what could have been achieved with modified behavior.
We develop and advocate for this hindsight framing of learning in general sequential decision-making settings.
We present examples illustrating the distinct strengths and weaknesses of each type of equilibrium in the literature.
arXiv Detail & Related papers (2020-12-10T18:30:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.