Related papers: Trading-Off Payments and Accuracy in Online Classification with Paid Stochastic Experts

Trading-Off Payments and Accuracy in Online Classification with Paid Stochastic Experts

URL: http://arxiv.org/abs/2307.00836v1
Date: Mon, 3 Jul 2023 08:20:13 GMT
Title: Trading-Off Payments and Accuracy in Online Classification with Paid Stochastic Experts
Authors: Dirk van der Hoeven, Ciara Pike-Burke, Hao Qiu, Nicolo Cesa-Bianchi
Abstract summary: We investigate online classification with paid experts. In each round, the learner must decide how much to pay each expert and then make a prediction. We introduce an online learning algorithm whose total cost after $T$ rounds exceeds that of a predictor.
Score: 14.891975420982513
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate online classification with paid stochastic experts. Here, before making their prediction, each expert must be paid. The amount that we pay each expert directly influences the accuracy of their prediction through some unknown Lipschitz "productivity" function. In each round, the learner must decide how much to pay each expert and then make a prediction. They incur a cost equal to a weighted sum of the prediction error and upfront payments for all experts. We introduce an online learning algorithm whose total cost after $T$ rounds exceeds that of a predictor which knows the productivity of all experts in advance by at most $\mathcal{O}(K^2(\log T)\sqrt{T})$ where $K$ is the number of experts. In order to achieve this result, we combine Lipschitz bandits and online classification with surrogate losses. These tools allow us to improve upon the bound of order $T^{2/3}$ one would obtain in the standard Lipschitz bandit setting. Our algorithm is empirically evaluated on synthetic data

Related papers

An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces [54.37047702755926]
We develop an analysis of Thompson sampling for online learning under full feedback. We show regret decomposes into regret the learner expected a priori, plus a prior-robustness-type term we call excess regret.
arXiv Detail & Related papers (2025-02-20T18:10:12Z)
Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making [58.06306331390586]
We introduce the notion of a margin complement, which measures how much a prediction score $S$ changes due to a thresholding operation. We show that under suitable causal assumptions, the influences of $X$ on the prediction score $S$ are equal to the influences of $X$ on the true outcome $Y$.
arXiv Detail & Related papers (2024-05-24T11:22:19Z)
Active Ranking of Experts Based on their Performances in Many Tasks [72.96112117037465]
We consider the problem of ranking n experts based on their performances on d tasks. We make a monotonicity assumption stating that for each pair of experts, one outperforms the other on all tasks.
arXiv Detail & Related papers (2023-06-05T06:55:39Z)
No-Regret Online Prediction with Strategic Experts [16.54912614895861]
We study a generalization of the online binary prediction with expert advice framework where at each round, the learner is allowed to pick $mgeq 1$ experts from a pool of $K$ experts. We focus on the setting in which experts act strategically and aim to maximize their influence on the algorithm's predictions by potentially misreporting their beliefs. Our goal is to design algorithms that satisfy the following two requirements: 1) $textitIncentive-compatible$: Incentivize the experts to report their beliefs truthfully, and 2) $textitNo-regret$: Achieve
arXiv Detail & Related papers (2023-05-24T16:43:21Z)
Streaming Algorithms for Learning with Experts: Deterministic Versus Robust [62.98860182111096]
In the online learning with experts problem, an algorithm must make a prediction about an outcome on each of $T$ days (or times) The goal is to make a prediction with the minimum cost, specifically compared to the best expert in the set. We show a space lower bound of $widetildeOmegaleft(fracnMRTright)$ for any deterministic algorithm that achieves regret $R$ when the best expert makes $M$ mistakes.
arXiv Detail & Related papers (2023-03-03T04:39:53Z)
Constant regret for sequence prediction with limited advice [0.0]
We provide a strategy combining only p = 2 experts per round for prediction and observing m $ge$ 2 experts' losses. If the learner is constrained to observe only one expert feedback per round, the worst-case regret is the "slow rate" $Omega$($sqrt$ KT)
arXiv Detail & Related papers (2022-10-05T13:32:49Z)
Memory Bounds for the Experts Problem [53.67419690563877]
Online learning with expert advice is a fundamental problem of sequential prediction. The goal is to process predictions, and make a prediction with the minimum cost. An algorithm is judged by how well it does compared to the best expert in the set.
arXiv Detail & Related papers (2022-04-21T01:22:18Z)
$k\ exttt{-experts}$ -- Online Policies and Fundamental Limits [8.84337023214151]
This paper studies the $textttExperts$ problem. The learner selects a subset of $k$ experts from a pool of $N$ experts at each round. The reward obtained by the learner at any round depends on the rewards of the selected experts.
arXiv Detail & Related papers (2021-10-15T06:30:15Z)
Malicious Experts versus the multiplicative weights algorithm in online prediction [85.62472761361107]
We consider a prediction problem with two experts and a forecaster. We assume that one of the experts is honest and makes correct prediction with probability $mu$ at each round. The other one is malicious, who knows true outcomes at each round and makes predictions in order to maximize the loss of the forecaster.
arXiv Detail & Related papers (2020-03-18T20:12:08Z)
Toward Optimal Adversarial Policies in the Multiplicative Learning System with a Malicious Expert [87.12201611818698]
We consider a learning system that combines experts' advice to predict a sequence of true outcomes. It is assumed that one of the experts is malicious and aims to impose the maximum loss on the system. We show that a simple greedy policy of always reporting false prediction is optimal with an approximation ratio of $1+O(sqrtfracln NN)$. For the online setting where the malicious expert can adaptively make its decisions, we show that the optimal online policy can be efficiently computed by solving a dynamic program in $O(N3)$.
arXiv Detail & Related papers (2020-01-02T18:04:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.