No-Regret and Incentive-Compatible Online Learning
- URL: http://arxiv.org/abs/2002.08837v2
- Date: Tue, 30 Jun 2020 18:57:41 GMT
- Title: No-Regret and Incentive-Compatible Online Learning
- Authors: Rupert Freeman, David M. Pennock, Chara Podimata, and Jennifer Wortman
Vaughan
- Abstract summary: We study online learning settings in which experts act strategically to maximize their influence on the learning algorithm's predictions.
We want the learning algorithm to be no-regret with respect to the best fixed expert in hindsight.
We provide algorithms that achieve no regret and incentive compatibility for experts for both the full and partial information settings.
- Score: 29.267666165169324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study online learning settings in which experts act strategically to
maximize their influence on the learning algorithm's predictions by potentially
misreporting their beliefs about a sequence of binary events. Our goal is
twofold. First, we want the learning algorithm to be no-regret with respect to
the best fixed expert in hindsight. Second, we want incentive compatibility, a
guarantee that each expert's best strategy is to report his true beliefs about
the realization of each event. To achieve this goal, we build on the literature
on wagering mechanisms, a type of multi-agent scoring rule. We provide
algorithms that achieve no regret and incentive compatibility for myopic
experts for both the full and partial information settings. In experiments on
datasets from FiveThirtyEight, our algorithms have regret comparable to classic
no-regret algorithms, which are not incentive-compatible. Finally, we identify
an incentive-compatible algorithm for forward-looking strategic agents that
exhibits diminishing regret in practice.
Related papers
- Incentive-compatible Bandits: Importance Weighting No More [14.344759978208957]
We study the problem of incentive-compatible online learning with bandit feedback.
In this work we propose the first incentive-compatible algorithms that enjoy $O(sqrtKT)$ regret bounds.
arXiv Detail & Related papers (2024-05-10T13:57:13Z) - On the price of exact truthfulness in incentive-compatible online learning with bandit feedback: A regret lower bound for WSU-UX [8.861995276194435]
In one view of the classical game of prediction with expert advice with binary outcomes, each expert maintains an adversarially chosen belief.
We consider a recently introduced, strategic variant of this problem with selfish (reputation-seeking) experts.
We show, via explicit construction of loss sequences, that the algorithm suffers a worst-case $Omega(T2/3)$ lower bound.
arXiv Detail & Related papers (2024-04-08T02:41:32Z) - Contextual Bandits and Imitation Learning via Preference-Based Active
Queries [17.73844193143454]
We consider the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action's reward.
Instead, the learner can actively query an expert at each round to compare two actions and receive noisy preference feedback.
The learner's objective is two-fold: to minimize the regret associated with the executed actions, while simultaneously, minimizing the number of comparison queries made to the expert.
arXiv Detail & Related papers (2023-07-24T16:36:04Z) - Active Ranking of Experts Based on their Performances in Many Tasks [72.96112117037465]
We consider the problem of ranking n experts based on their performances on d tasks.
We make a monotonicity assumption stating that for each pair of experts, one outperforms the other on all tasks.
arXiv Detail & Related papers (2023-06-05T06:55:39Z) - No-Regret Online Prediction with Strategic Experts [16.54912614895861]
We study a generalization of the online binary prediction with expert advice framework where at each round, the learner is allowed to pick $mgeq 1$ experts from a pool of $K$ experts.
We focus on the setting in which experts act strategically and aim to maximize their influence on the algorithm's predictions by potentially misreporting their beliefs.
Our goal is to design algorithms that satisfy the following two requirements: 1) $textitIncentive-compatible$: Incentivize the experts to report their beliefs truthfully, and 2) $textitNo-regret$: Achieve
arXiv Detail & Related papers (2023-05-24T16:43:21Z) - Streaming Algorithms for Learning with Experts: Deterministic Versus
Robust [62.98860182111096]
In the online learning with experts problem, an algorithm must make a prediction about an outcome on each of $T$ days (or times)
The goal is to make a prediction with the minimum cost, specifically compared to the best expert in the set.
We show a space lower bound of $widetildeOmegaleft(fracnMRTright)$ for any deterministic algorithm that achieves regret $R$ when the best expert makes $M$ mistakes.
arXiv Detail & Related papers (2023-03-03T04:39:53Z) - Strategizing against Learners in Bayesian Games [74.46970859427907]
We study repeated two-player games where one of the players, the learner, employs a no-regret learning strategy.
We consider general Bayesian games, where the payoffs of both the payoffs of both the learner and the learner could depend on the type.
arXiv Detail & Related papers (2022-05-17T18:10:25Z) - No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z) - The Information Geometry of Unsupervised Reinforcement Learning [133.20816939521941]
Unsupervised skill discovery is a class of algorithms that learn a set of policies without access to a reward function.
We show that unsupervised skill discovery algorithms do not learn skills that are optimal for every possible reward function.
arXiv Detail & Related papers (2021-10-06T13:08:36Z) - Of Moments and Matching: Trade-offs and Treatments in Imitation Learning [26.121994149869767]
We provide a unifying view of a large family of previous imitation learning algorithms through the lens of moment matching.
By considering adversarially chosen divergences between learner and expert behavior, we are able to derive bounds on policy performance.
We derive two novel algorithm templates, AdVIL and AdRIL, with strong guarantees, simple implementation, and competitive empirical performance.
arXiv Detail & Related papers (2021-03-04T18:57:11Z) - Double Coverage with Machine-Learned Advice [100.23487145400833]
We study the fundamental online $k$-server problem in a learning-augmented setting.
We show that our algorithm achieves for any k an almost optimal consistency-robustness tradeoff.
arXiv Detail & Related papers (2021-03-02T11:04:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.