Impossible Tuning Made Possible: A New Expert Algorithm and Its
Applications
- URL: http://arxiv.org/abs/2102.01046v1
- Date: Mon, 1 Feb 2021 18:34:21 GMT
- Title: Impossible Tuning Made Possible: A New Expert Algorithm and Its
Applications
- Authors: Liyu Chen, Haipeng Luo, Chen-Yu Wei
- Abstract summary: We show that it is possible to achieve regret $Oleft(sqrt(ln d)sum_t ell_t,i2right)$ simultaneously for all expert $i$ in a $T$-round $d$-expert problem.
Our algorithm is based on the Mirror Descent framework with a correction term and a weighted entropy regularizer.
- Score: 37.6975819766632
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We resolve the long-standing "impossible tuning" issue for the classic expert
problem and show that, it is in fact possible to achieve regret
$O\left(\sqrt{(\ln d)\sum_t \ell_{t,i}^2}\right)$ simultaneously for all expert
$i$ in a $T$-round $d$-expert problem where $\ell_{t,i}$ is the loss for expert
$i$ in round $t$. Our algorithm is based on the Mirror Descent framework with a
correction term and a weighted entropy regularizer. While natural, the
algorithm has not been studied before and requires a careful analysis. We also
generalize the bound to $O\left(\sqrt{(\ln d)\sum_t
(\ell_{t,i}-m_{t,i})^2}\right)$ for any prediction vector $m_t$ that the
learner receives, and recover or improve many existing results by choosing
different $m_t$. Furthermore, we use the same framework to create a master
algorithm that combines a set of base algorithms and learns the best one with
little overhead. The new guarantee of our master allows us to derive many new
results for both the expert problem and more generally Online Linear
Optimization.
Related papers
- Streaming Algorithms for Learning with Experts: Deterministic Versus
Robust [62.98860182111096]
In the online learning with experts problem, an algorithm must make a prediction about an outcome on each of $T$ days (or times)
The goal is to make a prediction with the minimum cost, specifically compared to the best expert in the set.
We show a space lower bound of $widetildeOmegaleft(fracnMRTright)$ for any deterministic algorithm that achieves regret $R$ when the best expert makes $M$ mistakes.
arXiv Detail & Related papers (2023-03-03T04:39:53Z) - Efficient and Optimal Fixed-Time Regret with Two Experts [5.650647159993238]
Prediction with expert advice is a foundational problem in online learning.
In instances with $T$ rounds and $n$ experts, the classical Multiplicative Weights Update method suffers at most $sqrt(T/2)ln n$ regret when $T$ is known beforehand.
When the number of experts $n$ is small/fixed, algorithms with better regret guarantees exist.
arXiv Detail & Related papers (2022-03-15T01:07:09Z) - Corralling a Larger Band of Bandits: A Case Study on Switching Regret
for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly.
The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon.
Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z) - Logarithmic Regret from Sublinear Hints [76.87432703516942]
We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model.
We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2021-11-09T16:50:18Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure.
We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds.
We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z) - Adaptive Online Learning with Varying Norms [45.11667443216861]
We provide an online convex optimization algorithm that outputs points $w_t$ in some domain $W$.
We apply this result to obtain new "full-matrix"-style regret bounds.
arXiv Detail & Related papers (2020-02-10T17:22:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.