Optimal Rates for Feasible Payoff Set Estimation in Games
- URL: http://arxiv.org/abs/2602.04397v1
- Date: Wed, 04 Feb 2026 10:27:11 GMT
- Title: Optimal Rates for Feasible Payoff Set Estimation in Games
- Authors: Annalisa Barbara, Riccardo Poiani, Martino Bernasconi, Andrea Celli,
- Abstract summary: Game inverse theory aims to identify the entire set of payoffs consistent with observed behavior.<n>We focus on the problem of estimating the set of feasible payoffs with high probability and up to precision $$ on the Hausdorff metric.<n>Our results provide learning-theoretic foundations for set-valued payoff inference in multi-agent environments.
- Score: 19.616985668606695
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study a setting in which two players play a (possibly approximate) Nash equilibrium of a bimatrix game, while a learner observes only their actions and has no knowledge of the equilibrium or the underlying game. A natural question is whether the learner can rationalize the observed behavior by inferring the players' payoff functions. Rather than producing a single payoff estimate, inverse game theory aims to identify the entire set of payoffs consistent with observed behavior, enabling downstream use in, e.g., counterfactual analysis and mechanism design across applications like auctions, pricing, and security games. We focus on the problem of estimating the set of feasible payoffs with high probability and up to precision $ε$ on the Hausdorff metric. We provide the first minimax-optimal rates for both exact and approximate equilibrium play, in zero-sum as well as general-sum games. Our results provide learning-theoretic foundations for set-valued payoff inference in multi-agent environments.
Related papers
- Learning Equilibria in Matching Games with Bandit Feedback [2.5015086558362247]
We investigate the problem of learning an equilibrium in a generalized two-sided matching market.<n>We propose a UCB algorithm in which agents form preferences and select actions based on optimistic estimates of the game payoffs.
arXiv Detail & Related papers (2025-06-04T10:15:51Z) - Instance-Dependent Regret Bounds for Learning Two-Player Zero-Sum Games with Bandit Feedback [60.610120215789976]
We show that when a pure strategy Nash equilibrium exists, $c$ becomes zero, leading to an optimal instance-dependent regret bound.<n>Our algorithm also enjoys last-iterate convergence and can identify the pure strategy Nash equilibrium with near-optimal sample.
arXiv Detail & Related papers (2025-02-24T20:20:06Z) - Last-iterate Convergence for Symmetric, General-sum, $2 \ imes 2$ Games Under The Exponential Weights Dynamic [15.036423818666774]
We conduct a comprehensive analysis of discrete-time exponential-weights with a constant step size on all emphgeneral-sum and symmetric $2 times 2$ normal-form games.<n>We show through a first-principles analysis that the exponential weights dynamic converges in the last iterate for such games.<n>We illustrate our theory with extensive simulations and applications to the aforementioned game-theoretic interactions.
arXiv Detail & Related papers (2025-02-12T02:04:46Z) - A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning [53.83345471268163]
We investigate learning the equilibria in non-stationary multi-agent systems.
We show how to test for various types of equilibria by a black-box reduction to single-agent learning.
arXiv Detail & Related papers (2023-06-12T23:48:24Z) - Learning Correlated Equilibria in Mean-Field Games [62.14589406821103]
We develop the concepts of Mean-Field correlated and coarse-correlated equilibria.
We show that they can be efficiently learnt in emphall games, without requiring any additional assumption on the structure of the game.
arXiv Detail & Related papers (2022-08-22T08:31:46Z) - Provably Efficient Fictitious Play Policy Optimization for Zero-Sum
Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions.
We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario.
Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z) - Learning Stationary Nash Equilibrium Policies in $n$-Player Stochastic
Games with Independent Chains [2.132096006921048]
We consider a class of $n$-player games in which players have their own internal state/action spaces while they are coupled through their payoff functions.
For this class of games, we first show that finding a stationary Nash (NE) policy without any assumption on the reward functions is interactable.
We develop algorithms based on dual averaging and dual mirror descent, which converge to the set of $epsilon$-NE policies.
arXiv Detail & Related papers (2022-01-28T16:27:21Z) - Towards convergence to Nash equilibria in two-team zero-sum games [17.4461045395989]
Two-team zero-sum games are defined as multi-player games where players are split into two competing sets of agents.
We focus on the solution concept of Nash equilibria (NE)
We show that computing NE for this class of games is $textithard$ for the complexity class $mathrm$.
arXiv Detail & Related papers (2021-11-07T21:15:35Z) - Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games [78.65798135008419]
It remains vastly open how to learn the Stackelberg equilibrium in general-sum games efficiently from samples.
This paper initiates the theoretical study of sample-efficient learning of the Stackelberg equilibrium in two-player turn-based general-sum games.
arXiv Detail & Related papers (2021-02-23T05:11:07Z) - Almost Optimal Algorithms for Two-player Markov Games with Linear
Function Approximation [92.99933928528797]
We study reinforcement learning for two-player zero-sum Markov games with simultaneous moves.
We propose an algorithm Nash-UCRL-VTR based on the principle "Optimism-in-Face-of-Uncertainty"
We show that Nash-UCRL-VTR can provably achieve an $tildeO(dHsqrtT)$ regret, where $d$ is the linear function dimension.
arXiv Detail & Related papers (2021-02-15T09:09:16Z) - Estimating $\alpha$-Rank by Maximizing Information Gain [26.440923373794444]
Game theory has been increasingly applied in settings where the game is not known outright, but has to be estimated by sampling.
In this paper, we focus on $alpha$-rank, a popular game-theoretic solution concept designed to perform well in such scenarios.
We show the benefits of using information gain as compared to the confidence interval criterion of ResponseGraphUCB.
arXiv Detail & Related papers (2021-01-22T15:46:35Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.