Is Learning in Games Good for the Learners?
- URL: http://arxiv.org/abs/2305.19496v2
- Date: Sat, 16 Dec 2023 19:35:31 GMT
- Title: Is Learning in Games Good for the Learners?
- Authors: William Brown, Jon Schneider, Kiran Vodrahalli
- Abstract summary: We consider tradeoffs between reward and regret in repeated gameplay between two agents.
We show that any such equilibrium is reachable by a pair of algorithms which maintain their regret guarantees against arbitrary opponents.
We also consider the question of learning reward-optimal strategies via repeated play with a no-regret agent when game is initially unknown.
- Score: 14.781100349601587
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a number of questions related to tradeoffs between reward and
regret in repeated gameplay between two agents. To facilitate this, we
introduce a notion of $\textit{generalized equilibrium}$ which allows for
asymmetric regret constraints, and yields polytopes of feasible values for each
agent and pair of regret constraints, where we show that any such equilibrium
is reachable by a pair of algorithms which maintain their regret guarantees
against arbitrary opponents. As a central example, we highlight the case one
agent is no-swap and the other's regret is unconstrained. We show that this
captures an extension of $\textit{Stackelberg}$ equilibria with a matching
optimal value, and that there exists a wide class of games where a player can
significantly increase their utility by deviating from a no-swap-regret
algorithm against a no-swap learner (in fact, almost any game without pure Nash
equilibria is of this form). Additionally, we make use of generalized
equilibria to consider tradeoffs in terms of the opponent's algorithm choice.
We give a tight characterization for the maximal reward obtainable against
$\textit{some}$ no-regret learner, yet we also show a class of games in which
this is bounded away from the value obtainable against the class of common
"mean-based" no-regret algorithms. Finally, we consider the question of
learning reward-optimal strategies via repeated play with a no-regret agent
when the game is initially unknown. Again we show tradeoffs depending on the
opponent's learning algorithm: the Stackelberg strategy is learnable in
exponential time with any no-regret agent (and in polynomial time with any
no-$\textit{adaptive}$-regret agent) for any game where it is learnable via
queries, and there are games where it is learnable in polynomial time against
any no-swap-regret agent but requires exponential time against a mean-based
no-regret agent.
Related papers
- A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning [53.83345471268163]
We investigate learning the equilibria in non-stationary multi-agent systems.
We show how to test for various types of equilibria by a black-box reduction to single-agent learning.
arXiv Detail & Related papers (2023-06-12T23:48:24Z) - Near-Optimal No-Regret Learning for General Convex Games [121.50979258049135]
We show that regret can be obtained for general convex and compact strategy sets.
Our dynamics are on an instantiation of optimistic follow-the-regularized-bounds over an appropriately emphlifted space.
Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret.
arXiv Detail & Related papers (2022-06-17T12:58:58Z) - Strategizing against Learners in Bayesian Games [74.46970859427907]
We study repeated two-player games where one of the players, the learner, employs a no-regret learning strategy.
We consider general Bayesian games, where the payoffs of both the payoffs of both the learner and the learner could depend on the type.
arXiv Detail & Related papers (2022-05-17T18:10:25Z) - Balancing Adaptability and Non-exploitability in Repeated Games [29.04618208068207]
We study the problem of guaranteeing low regret in repeated games against an opponent with unknown membership in one of several classes.
We add the constraint that our algorithm is non-exploitable, in that the opponent lacks an incentive to use an algorithm against which we cannot achieve rewards exceeding some "fair" value.
Our solution is an expert algorithm (LAFF) that searches within a set of sub-algorithms that are optimal for each opponent class and uses a punishment policy upon detecting evidence of exploitation by the opponent.
arXiv Detail & Related papers (2021-12-20T03:09:30Z) - Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback [29.553652241608997]
We study the class of textitsmooth and strongly monotone games and study optimal no-regret learning therein.
We first construct a new bandit learning algorithm and show that it achieves the single-agent optimal regret of $tildeTheta(nsqrtT)$.
Our results thus settle this open problem and contribute to the broad landscape of bandit game-theoretical learning.
arXiv Detail & Related papers (2021-12-06T08:27:54Z) - Near-Optimal No-Regret Learning in General Games [31.293412884145066]
We show that Optimistic Hedge attains $rm poly(log T)$ regret in multi-player general-sum games.
A corollary of our bound is that Optimistic Hedge converges to coarse correlated equilibrium in general games at a rate of $tildeOleft(frac 1Tright)$.
arXiv Detail & Related papers (2021-08-16T06:53:02Z) - Model-Free Online Learning in Unknown Sequential Decision Making
Problems and Games [114.90723492840499]
In large two-player zero-sum imperfect-information games, modern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium.
We formalize an online learning setting in which the strategy space is not known to the agent.
We give an efficient algorithm that achieves $O(T3/4)$ regret with high probability for that setting, even when the agent faces an adversarial environment.
arXiv Detail & Related papers (2021-03-08T04:03:24Z) - Online Learning in Unknown Markov Games [55.07327246187741]
We study online learning in unknown Markov games.
We show that achieving sublinear regret against the best response in hindsight is statistically hard.
We present an algorithm that achieves a sublinear $tildemathcalO(K2/3)$ regret after $K$ episodes.
arXiv Detail & Related papers (2020-10-28T14:52:15Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.