Related papers: Constant regret for sequence prediction with limited advice

Constant regret for sequence prediction with limited advice

URL: http://arxiv.org/abs/2210.02256v1
Date: Wed, 5 Oct 2022 13:32:49 GMT
Title: Constant regret for sequence prediction with limited advice
Authors: El Mehdi Saad (LMO), G. Blanchard (LMO, DATASHAPE)
Abstract summary: We provide a strategy combining only p = 2 experts per round for prediction and observing m $ge$ 2 experts' losses. If the learner is constrained to observe only one expert feedback per round, the worst-case regret is the "slow rate" $Omega$($sqrt$ KT)
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate the problem of cumulative regret minimization for individual sequence prediction with respect to the best expert in a finite family of size K under limited access to information. We assume that in each round, the learner can predict using a convex combination of at most p experts for prediction, then they can observe a posteriori the losses of at most m experts. We assume that the loss function is range-bounded and exp-concave. In the standard multi-armed bandits setting, when the learner is allowed to play only one expert per round and observe only its feedback, known optimal regret bounds are of the order O($\sqrt$ KT). We show that allowing the learner to play one additional expert per round and observe one additional feedback improves substantially the guarantees on regret. We provide a strategy combining only p = 2 experts per round for prediction and observing m $\ge$ 2 experts' losses. Its randomized regret (wrt. internal randomization of the learners' strategy) is of order O (K/m) log(K$\delta$ --1) with probability 1 -- $\delta$, i.e., is independent of the horizon T ("constant" or "fast rate" regret) if (p $\ge$ 2 and m $\ge$ 3). We prove that this rate is optimal up to a logarithmic factor in K. In the case p = m = 2, we provide an upper bound of order O(K 2 log(K$\delta$ --1)), with probability 1 -- $\delta$. Our strategies do not require any prior knowledge of the horizon T nor of the confidence parameter $\delta$. Finally, we show that if the learner is constrained to observe only one expert feedback per round, the worst-case regret is the "slow rate" $\Omega$($\sqrt$ KT), suggesting that synchronous observation of at least two experts per round is necessary to have a constant regret.

Related papers

Active Ranking of Experts Based on their Performances in Many Tasks [72.96112117037465]
We consider the problem of ranking n experts based on their performances on d tasks. We make a monotonicity assumption stating that for each pair of experts, one outperforms the other on all tasks.
arXiv Detail & Related papers (2023-06-05T06:55:39Z)
Streaming Algorithms for Learning with Experts: Deterministic Versus Robust [62.98860182111096]
In the online learning with experts problem, an algorithm must make a prediction about an outcome on each of $T$ days (or times) The goal is to make a prediction with the minimum cost, specifically compared to the best expert in the set. We show a space lower bound of $widetildeOmegaleft(fracnMRTright)$ for any deterministic algorithm that achieves regret $R$ when the best expert makes $M$ mistakes.
arXiv Detail & Related papers (2023-03-03T04:39:53Z)
Private Online Prediction from Experts: Separations and Faster Rates [74.52487417350221]
Online prediction from experts is a fundamental problem in machine learning and several works have studied this problem under privacy constraints. We propose and analyze new algorithms for this problem that improve over the regret bounds of the best existing algorithms for non-adaptive adversaries.
arXiv Detail & Related papers (2022-10-24T18:40:19Z)
A Regret-Variance Trade-Off in Online Learning [14.41667013062914]
We show how variance of predictions can be exploited in learning. In online prediction with corrupted losses, we show that the effect of corruption on the regret can be compensated by a large variance. We extend our results to the setting of online linear regression.
arXiv Detail & Related papers (2022-06-06T14:50:19Z)
No-Regret Learning with Unbounded Losses: The Case of Logarithmic Pooling [12.933990572597583]
We focus on the fundamental and practical aggregation method known as logarithmic pooling. We consider the problem of learning the best set of parameters in an online adversarial setting. We present an algorithm that learns expert weights in a way that attains $O(sqrtT log T)$ expected regret.
arXiv Detail & Related papers (2022-02-22T22:27:25Z)
Fast rates for prediction with limited expert advice [0.0]
We investigate the problem of minimizing the excess generalization error with respect to the best expert prediction in a finite family under limited access to information. We show that if we are allowed to see the advice of only one expert per round for T rounds in the training phase, the worst-case excess risk is $Omega$ (1/ $sqrt$ T) with probability lower bounded by a constant. We design novel algorithms achieving this rate in this setting, and give precise instance-dependent bounds on the number of training rounds and queries needed to achieve a given generalization error precision.
arXiv Detail & Related papers (2021-10-27T14:57:36Z)
Nonstochastic Bandits with Infinitely Many Experts [1.7188280334580197]
We study the problem of nonstochastic bandits with infinitely many experts. We propose a variant of Exp4.P that, for finitely many experts, enables inference of correct expert rankings. We then incorporate the variant into a meta-algorithm that works on infinitely many experts.
arXiv Detail & Related papers (2021-02-09T22:42:36Z)
Fast Rates for the Regret of Offline Reinforcement Learning [69.23654172273085]
We study the regret of reinforcement learning from offline data generated by a fixed behavior policy in an infinitehorizon discounted decision process (MDP) We show that given any estimate for the optimal quality function $Q*$, the regret of the policy it defines converges at a rate given by the exponentiation of the $Q*$-estimate's pointwise convergence rate.
arXiv Detail & Related papers (2021-01-31T16:17:56Z)
Malicious Experts versus the multiplicative weights algorithm in online prediction [85.62472761361107]
We consider a prediction problem with two experts and a forecaster. We assume that one of the experts is honest and makes correct prediction with probability $mu$ at each round. The other one is malicious, who knows true outcomes at each round and makes predictions in order to maximize the loss of the forecaster.
arXiv Detail & Related papers (2020-03-18T20:12:08Z)
Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors. We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.