Strategizing against Learners in Bayesian Games
- URL: http://arxiv.org/abs/2205.08562v1
- Date: Tue, 17 May 2022 18:10:25 GMT
- Title: Strategizing against Learners in Bayesian Games
- Authors: Yishay Mansour, Mehryar Mohri, Jon Schneider, Balasubramanian Sivan
- Abstract summary: We study repeated two-player games where one of the players, the learner, employs a no-regret learning strategy.
We consider general Bayesian games, where the payoffs of both the payoffs of both the learner and the learner could depend on the type.
- Score: 74.46970859427907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study repeated two-player games where one of the players, the learner,
employs a no-regret learning strategy, while the other, the optimizer, is a
rational utility maximizer. We consider general Bayesian games, where the
payoffs of both the optimizer and the learner could depend on the type, which
is drawn from a publicly known distribution, but revealed privately to the
learner. We address the following questions: (a) what is the bare minimum that
the optimizer can guarantee to obtain regardless of the no-regret learning
algorithm employed by the learner? (b) are there learning algorithms that cap
the optimizer payoff at this minimum? (c) can these algorithms be implemented
efficiently? While building this theory of optimizer-learner interactions, we
define a new combinatorial notion of regret called polytope swap regret, that
could be of independent interest in other settings.
Related papers
- Maximizing utility in multi-agent environments by anticipating the behavior of other learners [17.703508282875323]
In multi-agent settings, the decisions of each agent can affect the utilities/losses the other agents.
In this paper, we study repeated two-player games involving two types of agents.
arXiv Detail & Related papers (2024-07-05T23:16:18Z) - Contextual Bandits and Imitation Learning via Preference-Based Active
Queries [17.73844193143454]
We consider the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action's reward.
Instead, the learner can actively query an expert at each round to compare two actions and receive noisy preference feedback.
The learner's objective is two-fold: to minimize the regret associated with the executed actions, while simultaneously, minimizing the number of comparison queries made to the expert.
arXiv Detail & Related papers (2023-07-24T16:36:04Z) - No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z) - Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback [29.553652241608997]
We study the class of textitsmooth and strongly monotone games and study optimal no-regret learning therein.
We first construct a new bandit learning algorithm and show that it achieves the single-agent optimal regret of $tildeTheta(nsqrtT)$.
Our results thus settle this open problem and contribute to the broad landscape of bandit game-theoretical learning.
arXiv Detail & Related papers (2021-12-06T08:27:54Z) - Efficient and Optimal Algorithms for Contextual Dueling Bandits under
Realizability [59.81339109121384]
We study the $K$ contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes emphpreference-based feedback suggesting that one decision was better than the other.
We provide a new algorithm that achieves the optimal regret rate for a new notion of best response regret, which is a strictly stronger performance measure than those considered in prior works.
arXiv Detail & Related papers (2021-11-24T07:14:57Z) - Fast Rates for Nonparametric Online Learning: From Realizability to
Learning in Games [36.969021834291745]
We propose a proper learning algorithm which gets a near-optimal mistake bound in terms of the sequential fat-shattering dimension of the hypothesis class.
This result answers a question as to whether proper learners could achieve near-optimal mistake bounds.
For the real-valued (regression) setting, the optimal mistake bound was not even known for improper learners.
arXiv Detail & Related papers (2021-11-17T05:24:21Z) - The Information Geometry of Unsupervised Reinforcement Learning [133.20816939521941]
Unsupervised skill discovery is a class of algorithms that learn a set of policies without access to a reward function.
We show that unsupervised skill discovery algorithms do not learn skills that are optimal for every possible reward function.
arXiv Detail & Related papers (2021-10-06T13:08:36Z) - Online Multiobjective Minimax Optimization and Applications [14.699969822572308]
We introduce a simple but general online learning framework, in which an adaptive adversary introduces a new game.
The learner's goal is to play so as to minimize the maximum coordinate of the cumulative vector-valued loss.
We give a simple algorithm that can compete with the setting in which the adversary must announce their action first.
This includes no regret learning: we can recover optimal algorithms and bounds for minimizing external regret, internal regret, adaptive regret, multigroup regret, subsequence regret, and a notion of regret in the sleeping experts setting.
arXiv Detail & Related papers (2021-08-09T06:52:08Z) - Adversaries in Online Learning Revisited: with applications in Robust
Optimization and Adversarial training [55.30970087795483]
We revisit the concept of "adversary" in online learning, motivated by solving robust optimization and adversarial training problems.
We establish a general approach for a large variety of problem classes using imaginary play.
arXiv Detail & Related papers (2021-01-27T14:23:06Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.