Related papers: ReLExS: Reinforcement Learning Explanations for Stackelberg No-Regret Learners

ReLExS: Reinforcement Learning Explanations for Stackelberg No-Regret Learners

URL: http://arxiv.org/abs/2408.14086v1
Date: Mon, 26 Aug 2024 08:12:26 GMT
Title: ReLExS: Reinforcement Learning Explanations for Stackelberg No-Regret Learners
Authors: Xiangge Huang, Jingyuan Li, Jiaqing Xie,
Abstract summary: We show when the follower strategy is either reward-average or transform-reward-average, the two players can always get the Stackelberg equilibrium. Also, we show a strict upper bound of the follower's utility difference between with and without no regret constraint.
Score: 1.4849645397321183
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the constraint of a no regret follower, will the players in a two-player Stackelberg game still reach Stackelberg equilibrium? We first show when the follower strategy is either reward-average or transform-reward-average, the two players can always get the Stackelberg Equilibrium. Then, we extend that the players can achieve the Stackelberg equilibrium in the two-player game under the no regret constraint. Also, we show a strict upper bound of the follower's utility difference between with and without no regret constraint. Moreover, in constant-sum two-player Stackelberg games with non-regret action sequences, we ensure the total optimal utility of the game remains also bounded.

Related papers

Instance-Dependent Regret Bounds for Learning Two-Player Zero-Sum Games with Bandit Feedback [60.610120215789976]
We show that when a pure strategy Nash equilibrium exists, $c$ becomes zero, leading to an optimal instance-dependent regret bound. Our algorithm also enjoys last-iterate convergence and can identify the pure strategy Nash equilibrium with near-optimal sample.
arXiv Detail & Related papers (2025-02-24T20:20:06Z)
Neural Operators Can Play Dynamic Stackelberg Games [9.058593115274336]
Dynamic Stackelberg games are a broad class of two-player games in which the leader acts first, and the follower chooses a response strategy to the leader's strategy. This paper addresses the issue by showing that the textitfollower's best-response operator can be approximately implemented by an textitattention-based neural operator We show that the value of the Stackelberg game where the follower uses the approximate best-response operator approximates the value of the original Stackelberg game.
arXiv Detail & Related papers (2024-11-14T18:12:06Z)
Is Learning in Games Good for the Learners? [14.781100349601587]
We consider tradeoffs between reward and regret in repeated gameplay between two agents. We show that any such equilibrium is reachable by a pair of algorithms which maintain their regret guarantees against arbitrary opponents. We also consider the question of learning reward-optimal strategies via repeated play with a no-regret agent when game is initially unknown.
arXiv Detail & Related papers (2023-05-31T02:10:27Z)
Online Learning in Stackelberg Games with an Omniscient Follower [83.42564921330896]
We study the problem of online learning in a two-player decentralized cooperative Stackelberg game. In each round, the leader first takes an action, followed by the follower who takes their action after observing the leader's move. We show that depending on the reward structure, the existence of the omniscient follower may change the sample complexity drastically.
arXiv Detail & Related papers (2023-01-27T03:35:10Z)
On the Convergence of No-Regret Learning Dynamics in Time-Varying Games [89.96815099996132]
We characterize the convergence of optimistic gradient descent (OGD) in time-varying games. Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games. We also provide new insights on dynamic regret guarantees in static games.
arXiv Detail & Related papers (2023-01-26T17:25:45Z)
Learning Correlated Stackelberg Equilibrium in General-Sum Multi-Leader-Single-Follower Games [16.810700878778007]
We study a hierarchical multi-player game structure, where players with asymmetric roles can be separated into leaders and followers. In particular, we focus on a Stackelberg game scenario where there are multiple leaders and a single follower. We propose a novel asymmetric equilibrium concept for the MLSF game called Correlated Stackelberg Equilibrium (CSE)
arXiv Detail & Related papers (2022-10-22T15:05:44Z)
Robust No-Regret Learning in Min-Max Stackelberg Games [1.6500749121196987]
We investigate the behavior of no-regret learning in min-max games with dependent strategy sets. We show that no-regret dynamics converge to a Stackelberg equilibrium. We prove OMD dynamics are robust for a large class of online min-max Stackelberg games.
arXiv Detail & Related papers (2022-03-26T18:12:40Z)
Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopic Followers? [156.5760265539888]
We study multi-player general-sum Markov games with one of the players designated as the leader and the other players regarded as followers. For such a game, our goal is to find a Stackelberg-Nash equilibrium (SNE), which is a policy pair $(pi*, nu*)$. We develop sample-efficient reinforcement learning (RL) algorithms for solving for an SNE in both online and offline settings.
arXiv Detail & Related papers (2021-12-27T05:41:14Z)
Adversarial Training as Stackelberg Game: An Unrolled Optimization Approach [91.74682538906691]
Adversarial training has been shown to improve the generalization performance of deep learning models. We propose Stackelberg Adversarial Training (SALT), which formulates adversarial training as a Stackelberg game.
arXiv Detail & Related papers (2021-04-11T00:44:57Z)
Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games [78.65798135008419]
It remains vastly open how to learn the Stackelberg equilibrium in general-sum games efficiently from samples. This paper initiates the theoretical study of sample-efficient learning of the Stackelberg equilibrium in two-player turn-based general-sum games.
arXiv Detail & Related papers (2021-02-23T05:11:07Z)
Online Learning in Unknown Markov Games [55.07327246187741]
We study online learning in unknown Markov games. We show that achieving sublinear regret against the best response in hindsight is statistically hard. We present an algorithm that achieves a sublinear $tildemathcalO(K2/3)$ regret after $K$ episodes.
arXiv Detail & Related papers (2020-10-28T14:52:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.