Related papers: Balancing Adaptability and Non-exploitability in Repeated Games

Balancing Adaptability and Non-exploitability in Repeated Games

URL: http://arxiv.org/abs/2112.10314v1
Date: Mon, 20 Dec 2021 03:09:30 GMT
Title: Balancing Adaptability and Non-exploitability in Repeated Games
Authors: Anthony DiGiovanni and Ambuj Tewari
Abstract summary: We study the problem of guaranteeing low regret in repeated games against an opponent with unknown membership in one of several classes. We add the constraint that our algorithm is non-exploitable, in that the opponent lacks an incentive to use an algorithm against which we cannot achieve rewards exceeding some "fair" value. Our solution is an expert algorithm (LAFF) that searches within a set of sub-algorithms that are optimal for each opponent class and uses a punishment policy upon detecting evidence of exploitation by the opponent.
Score: 29.04618208068207
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We study the problem of guaranteeing low regret in repeated games against an opponent with unknown membership in one of several classes. We add the constraint that our algorithm is non-exploitable, in that the opponent lacks an incentive to use an algorithm against which we cannot achieve rewards exceeding some "fair" value. Our solution is an expert algorithm (LAFF) that searches within a set of sub-algorithms that are optimal for each opponent class and uses a punishment policy upon detecting evidence of exploitation by the opponent. With benchmarks that depend on the opponent class, we show that LAFF has sublinear regret uniformly over the possible opponents, except exploitative ones, for which we guarantee that the opponent has linear regret. To our knowledge, this work is the first to provide guarantees for both regret and non-exploitability in multi-agent learning.

Related papers

Adapting Beyond the Depth Limit: Counter Strategies in Large Imperfect Information Games [4.56754610152086]
We study the problem of adapting to a known sub-rational opponent during online play while remaining robust to rational opponents. Existing methods assume rational play beyond the depth-limit, which only allows them to adapt a very limited portion of the opponent's behaviour. We propose an algorithm that uses a strategy-portfolio approach - which we refer to as matrix-valued states - for depth-limited search.
arXiv Detail & Related papers (2025-01-15T14:04:27Z)
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM) We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary. We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z)
Is Learning in Games Good for the Learners? [14.781100349601587]
We consider tradeoffs between reward and regret in repeated gameplay between two agents. We show that any such equilibrium is reachable by a pair of algorithms which maintain their regret guarantees against arbitrary opponents. We also consider the question of learning reward-optimal strategies via repeated play with a no-regret agent when game is initially unknown.
arXiv Detail & Related papers (2023-05-31T02:10:27Z)
Strategizing against Learners in Bayesian Games [74.46970859427907]
We study repeated two-player games where one of the players, the learner, employs a no-regret learning strategy. We consider general Bayesian games, where the payoffs of both the payoffs of both the learner and the learner could depend on the type.
arXiv Detail & Related papers (2022-05-17T18:10:25Z)
Learning Markov Games with Adversarial Opponents: Efficient Algorithms and Fundamental Limits [37.573273877814906]
An ideal strategy in zero-sum games should not only grant the player an average reward no less than the value of Nash equilibrium, but also exploit the (adaptive) opponents when they are suboptimal. This work studies no-regret learning in Markov games with adversarial opponents when competing against the best fixed policy in hindsight.
arXiv Detail & Related papers (2022-03-14T01:23:56Z)
No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning. We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures. Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z)
L2E: Learning to Exploit Your Opponent [66.66334543946672]
We propose a novel Learning to Exploit framework for implicit opponent modeling. L2E acquires the ability to exploit opponents by a few interactions with different opponents during training. We propose a novel opponent strategy generation algorithm that produces effective opponents for training automatically.
arXiv Detail & Related papers (2021-02-18T14:27:59Z)
Online Learning in Unknown Markov Games [55.07327246187741]
We study online learning in unknown Markov games. We show that achieving sublinear regret against the best response in hindsight is statistically hard. We present an algorithm that achieves a sublinear $tildemathcalO(K2/3)$ regret after $K$ episodes.
arXiv Detail & Related papers (2020-10-28T14:52:15Z)
Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action. We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents. Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z)
Matrix games with bandit feedback [33.637621576707076]
We study a version of the classical zero-sum matrix game with unknown payoff matrix and bandit feedback. We show that Thompson fails catastrophically in this setting and provide empirical comparison to existing algorithms.
arXiv Detail & Related papers (2020-06-09T09:36:21Z)
Provable Self-Play Algorithms for Competitive Reinforcement Learning [48.12602400021397]
We study self-play in competitive reinforcement learning under the setting of Markov games. We show that a self-play algorithm achieves regret $tildemathcalO(sqrtT)$ after playing $T$ steps of the game. We also introduce an explore-then-exploit style algorithm, which achieves a slightly worse regret $tildemathcalO(T2/3)$, but is guaranteed to run in time even in the worst case.
arXiv Detail & Related papers (2020-02-10T18:44:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.