Balancing Adaptability and Non-exploitability in Repeated Games
- URL: http://arxiv.org/abs/2112.10314v1
- Date: Mon, 20 Dec 2021 03:09:30 GMT
- Title: Balancing Adaptability and Non-exploitability in Repeated Games
- Authors: Anthony DiGiovanni and Ambuj Tewari
- Abstract summary: We study the problem of guaranteeing low regret in repeated games against an opponent with unknown membership in one of several classes.
We add the constraint that our algorithm is non-exploitable, in that the opponent lacks an incentive to use an algorithm against which we cannot achieve rewards exceeding some "fair" value.
Our solution is an expert algorithm (LAFF) that searches within a set of sub-algorithms that are optimal for each opponent class and uses a punishment policy upon detecting evidence of exploitation by the opponent.
- Score: 29.04618208068207
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We study the problem of guaranteeing low regret in repeated games against an
opponent with unknown membership in one of several classes. We add the
constraint that our algorithm is non-exploitable, in that the opponent lacks an
incentive to use an algorithm against which we cannot achieve rewards exceeding
some "fair" value. Our solution is an expert algorithm (LAFF) that searches
within a set of sub-algorithms that are optimal for each opponent class and
uses a punishment policy upon detecting evidence of exploitation by the
opponent. With benchmarks that depend on the opponent class, we show that LAFF
has sublinear regret uniformly over the possible opponents, except exploitative
ones, for which we guarantee that the opponent has linear regret. To our
knowledge, this work is the first to provide guarantees for both regret and
non-exploitability in multi-agent learning.
Related papers
- Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)
We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.
We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Is Learning in Games Good for the Learners? [14.781100349601587]
We consider tradeoffs between reward and regret in repeated gameplay between two agents.
We show that any such equilibrium is reachable by a pair of algorithms which maintain their regret guarantees against arbitrary opponents.
We also consider the question of learning reward-optimal strategies via repeated play with a no-regret agent when game is initially unknown.
arXiv Detail & Related papers (2023-05-31T02:10:27Z) - Strategizing against Learners in Bayesian Games [74.46970859427907]
We study repeated two-player games where one of the players, the learner, employs a no-regret learning strategy.
We consider general Bayesian games, where the payoffs of both the payoffs of both the learner and the learner could depend on the type.
arXiv Detail & Related papers (2022-05-17T18:10:25Z) - Learning Markov Games with Adversarial Opponents: Efficient Algorithms
and Fundamental Limits [37.573273877814906]
An ideal strategy in zero-sum games should not only grant the player an average reward no less than the value of Nash equilibrium, but also exploit the (adaptive) opponents when they are suboptimal.
This work studies no-regret learning in Markov games with adversarial opponents when competing against the best fixed policy in hindsight.
arXiv Detail & Related papers (2022-03-14T01:23:56Z) - No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z) - L2E: Learning to Exploit Your Opponent [66.66334543946672]
We propose a novel Learning to Exploit framework for implicit opponent modeling.
L2E acquires the ability to exploit opponents by a few interactions with different opponents during training.
We propose a novel opponent strategy generation algorithm that produces effective opponents for training automatically.
arXiv Detail & Related papers (2021-02-18T14:27:59Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z) - Matrix games with bandit feedback [33.637621576707076]
We study a version of the classical zero-sum matrix game with unknown payoff matrix and bandit feedback.
We show that Thompson fails catastrophically in this setting and provide empirical comparison to existing algorithms.
arXiv Detail & Related papers (2020-06-09T09:36:21Z) - Provable Self-Play Algorithms for Competitive Reinforcement Learning [48.12602400021397]
We study self-play in competitive reinforcement learning under the setting of Markov games.
We show that a self-play algorithm achieves regret $tildemathcalO(sqrtT)$ after playing $T$ steps of the game.
We also introduce an explore-then-exploit style algorithm, which achieves a slightly worse regret $tildemathcalO(T2/3)$, but is guaranteed to run in time even in the worst case.
arXiv Detail & Related papers (2020-02-10T18:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.