Logarithmic Regret from Sublinear Hints
- URL: http://arxiv.org/abs/2111.05257v1
- Date: Tue, 9 Nov 2021 16:50:18 GMT
- Title: Logarithmic Regret from Sublinear Hints
- Authors: Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit
- Abstract summary: We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model.
We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
- Score: 76.87432703516942
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the online linear optimization problem, where at every step the
algorithm plays a point $x_t$ in the unit ball, and suffers loss $\langle c_t,
x_t\rangle$ for some cost vector $c_t$ that is then revealed to the algorithm.
Recent work showed that if an algorithm receives a hint $h_t$ that has
non-trivial correlation with $c_t$ before it plays $x_t$, then it can achieve a
regret guarantee of $O(\log T)$, improving on the bound of $\Theta(\sqrt{T})$
in the standard setting. In this work, we study the question of whether an
algorithm really requires a hint at every time step. Somewhat surprisingly, we
show that an algorithm can obtain $O(\log T)$ regret with just $O(\sqrt{T})$
hints under a natural query model; in contrast, we also show that $o(\sqrt{T})$
hints cannot guarantee better than $\Omega(\sqrt{T})$ regret. We give two
applications of our result, to the well-studied setting of optimistic regret
bounds and to the problem of online learning with abstention.
Related papers
- Improved Kernel Alignment Regret Bound for Online Kernel Learning [11.201662566974232]
We propose an algorithm whose regret bound and computational complexity are better than previous results.
If the eigenvalues of the kernel matrix decay exponentially, then our algorithm enjoys a regret of $O(sqrtmathcalA_T)$ at a computational complexity of $O(ln2T)$.
We extend our algorithm to batch learning and obtain a $O(frac1TsqrtmathbbE[mathcalA_T])$ excess risk bound which improves the previous $O
arXiv Detail & Related papers (2022-12-26T02:32:20Z) - Online Learning with Bounded Recall [11.046741824529107]
We study the problem of full-information online learning in the "bounded recall" setting popular in the study of repeated games.
An online learning algorithm $mathcalA$ is $M$-$textitbounded-recall$ if its output at time $t$ can be written as a function of the $M$ previous rewards.
arXiv Detail & Related papers (2022-05-28T20:52:52Z) - Corralling a Larger Band of Bandits: A Case Study on Switching Regret
for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly.
The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon.
Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z) - Contextual Recommendations and Low-Regret Cutting-Plane Algorithms [49.91214213074933]
We consider the following variant of contextual linear bandits motivated by routing applications in navigational engines and recommendation systems.
We design novel cutting-plane algorithms with low "regret" -- the total distance between the true point $w*$ and the hyperplanes the separation oracle returns.
arXiv Detail & Related papers (2021-06-09T05:39:05Z) - Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure.
We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds.
We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z) - Impossible Tuning Made Possible: A New Expert Algorithm and Its
Applications [37.6975819766632]
We show that it is possible to achieve regret $Oleft(sqrt(ln d)sum_t ell_t,i2right)$ simultaneously for all expert $i$ in a $T$-round $d$-expert problem.
Our algorithm is based on the Mirror Descent framework with a correction term and a weighted entropy regularizer.
arXiv Detail & Related papers (2021-02-01T18:34:21Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - $Q$-learning with Logarithmic Regret [60.24952657636464]
We prove that an optimistic $Q$-learning enjoys a $mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $T$ is the total number of steps, and $Delta_min$ is the minimum sub-optimality gap.
arXiv Detail & Related papers (2020-06-16T13:01:33Z) - Adaptive Online Learning with Varying Norms [45.11667443216861]
We provide an online convex optimization algorithm that outputs points $w_t$ in some domain $W$.
We apply this result to obtain new "full-matrix"-style regret bounds.
arXiv Detail & Related papers (2020-02-10T17:22:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.