Learning Equilibria in Matching Games with Bandit Feedback
- URL: http://arxiv.org/abs/2506.03802v1
- Date: Wed, 04 Jun 2025 10:15:51 GMT
- Title: Learning Equilibria in Matching Games with Bandit Feedback
- Authors: Andreas Athanasopoulos, Christos Dimitrakakis,
- Abstract summary: We investigate the problem of learning an equilibrium in a generalized two-sided matching market.<n>We propose a UCB algorithm in which agents form preferences and select actions based on optimistic estimates of the game payoffs.
- Score: 2.5015086558362247
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate the problem of learning an equilibrium in a generalized two-sided matching market, where agents can adaptively choose their actions based on their assigned matches. Specifically, we consider a setting in which matched agents engage in a zero-sum game with initially unknown payoff matrices, and we explore whether a centralized procedure can learn an equilibrium from bandit feedback. We adopt the solution concept of matching equilibrium, where a pair consisting of a matching $\mathfrak{m}$ and a set of agent strategies $X$ forms an equilibrium if no agent has the incentive to deviate from $(\mathfrak{m}, X)$. To measure the deviation of a given pair $(\mathfrak{m}, X)$ from the equilibrium pair $(\mathfrak{m}^\star, X^\star)$, we introduce matching instability that can serve as a regret measure for the corresponding learning problem. We then propose a UCB algorithm in which agents form preferences and select actions based on optimistic estimates of the game payoffs, and prove that it achieves sublinear, instance-independent regret over a time horizon $T$.
Related papers
- A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning [53.83345471268163]
We investigate learning the equilibria in non-stationary multi-agent systems.
We show how to test for various types of equilibria by a black-box reduction to single-agent learning.
arXiv Detail & Related papers (2023-06-12T23:48:24Z) - Equilibrium Bandits: Learning Optimal Equilibria of Unknown Dynamics [23.722837647516357]
Consider a decision-maker that can pick one out of $K$ actions to control an unknown system, for $T$ turns.
The dynamics of the system are unknown to the decision-maker, which can only observe a noisy reward at the end of every turn.
Existing bandit algorithms, either adversarial, achieve linear (tau) regret for this problem.
We present a novel algorithm that knows to switch an action quickly if it is not worth it to wait until the equilibrium is reached.
arXiv Detail & Related papers (2023-02-27T10:47:15Z) - Differentiable Arbitrating in Zero-sum Markov Games [59.62061049680365]
We study how to perturb the reward in a zero-sum Markov game with two players to induce a desirable Nash equilibrium, namely arbitrating.
The lower level requires solving the Nash equilibrium under a given reward function, which makes the overall problem challenging to optimize in an end-to-end way.
We propose a backpropagation scheme that differentiates through the Nash equilibrium, which provides the gradient feedback for the upper level.
arXiv Detail & Related papers (2023-02-20T16:05:04Z) - Minimax-Optimal Multi-Agent RL in Zero-Sum Markov Games With a
Generative Model [50.38446482252857]
Two-player zero-sum Markov games are arguably the most basic setting in multi-agent reinforcement learning.
We develop a learning algorithm that learns an $varepsilon$-approximate Markov NE policy using $$ widetildeObigg.
We derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities.
arXiv Detail & Related papers (2022-08-22T17:24:55Z) - Near-Optimal Learning of Extensive-Form Games with Imperfect Information [54.55092907312749]
We present the first line of algorithms that require only $widetildemathcalO((XA+YB)/varepsilon2)$ episodes of play to find an $varepsilon$-approximate Nash equilibrium in two-player zero-sum games.
This improves upon the best known sample complexity of $widetildemathcalO((X2A+Y2B)/varepsilon2)$ by a factor of $widetildemathcalO(maxX,
arXiv Detail & Related papers (2022-02-03T18:18:28Z) - Multi-Leader Congestion Games with an Adversary [0.5914780964919123]
We study a multi-leader single-follower congestion game where multiple users (leaders) choose one resource out of a set of resources.
We show that pure Nash equilibria may fail to exist and therefore, we consider approximate equilibria instead.
We show how to compute efficiently a best approximate equilibrium, that is, with smallest possible $alpha$ among all $alpha$-approximate equilibria of the given instance.
arXiv Detail & Related papers (2021-12-14T14:47:43Z) - Learning equilibria with personalized incentives in a class of
nonmonotone games [7.713240800142863]
We consider quadratic, nonmonotone generalized Nash equilibrium problems with symmetric interactions among the agents, which are known to be potential.
In the proposed scheme, a coordinator iteratively integrates the noisy agents' feedback to learn the pseudo-gradients of the agents, and then design personalized incentives for them.
We show that our algorithm returns an equilibrium in case the coordinator is endowed with standard learning policies, and corroborate our results on a numerical instance of a hypomonotone game.
arXiv Detail & Related papers (2021-11-06T11:18:59Z) - Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games [78.65798135008419]
It remains vastly open how to learn the Stackelberg equilibrium in general-sum games efficiently from samples.
This paper initiates the theoretical study of sample-efficient learning of the Stackelberg equilibrium in two-player turn-based general-sum games.
arXiv Detail & Related papers (2021-02-23T05:11:07Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.