On the Minimax Regret for Online Learning with Feedback Graphs
- URL: http://arxiv.org/abs/2305.15383v2
- Date: Sat, 28 Oct 2023 14:11:51 GMT
- Title: On the Minimax Regret for Online Learning with Feedback Graphs
- Authors: Khaled Eldowa, Emmanuel Esposito, Tommaso Cesari, Nicol\`o
Cesa-Bianchi
- Abstract summary: We improve on the upper and lower bounds for the regret of online learning with strongly observable undirected feedback graphs.
Our improved upper bound $mathcalObigl(sqrtalpha T(ln K)/(lnalpha)bigr)$ holds for any $alpha$ and matches the lower bounds for bandits and experts.
- Score: 5.721380617450645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we improve on the upper and lower bounds for the regret of
online learning with strongly observable undirected feedback graphs. The best
known upper bound for this problem is $\mathcal{O}\bigl(\sqrt{\alpha T\ln
K}\bigr)$, where $K$ is the number of actions, $\alpha$ is the independence
number of the graph, and $T$ is the time horizon. The $\sqrt{\ln K}$ factor is
known to be necessary when $\alpha = 1$ (the experts case). On the other hand,
when $\alpha = K$ (the bandits case), the minimax rate is known to be
$\Theta\bigl(\sqrt{KT}\bigr)$, and a lower bound $\Omega\bigl(\sqrt{\alpha
T}\bigr)$ is known to hold for any $\alpha$. Our improved upper bound
$\mathcal{O}\bigl(\sqrt{\alpha T(1+\ln(K/\alpha))}\bigr)$ holds for any
$\alpha$ and matches the lower bounds for bandits and experts, while
interpolating intermediate cases. To prove this result, we use FTRL with
$q$-Tsallis entropy for a carefully chosen value of $q \in [1/2, 1)$ that
varies with $\alpha$. The analysis of this algorithm requires a new bound on
the variance term in the regret. We also show how to extend our techniques to
time-varying graphs, without requiring prior knowledge of their independence
numbers. Our upper bound is complemented by an improved
$\Omega\bigl(\sqrt{\alpha T(\ln K)/(\ln\alpha)}\bigr)$ lower bound for all
$\alpha > 1$, whose analysis relies on a novel reduction to multitask learning.
This shows that a logarithmic factor is necessary as soon as $\alpha < K$.
Related papers
- Online Learning with Feedback Graphs: The True Shape of Regret [82.00098840619847]
We prove that the minimax regret is proportional to $R*$ for any graph and time horizon $T$.
Introducing an intricate exploration strategy, we define the mainAlgorithm algorithm that achieves the minimax optimal regret bound.
arXiv Detail & Related papers (2023-06-05T15:35:00Z) - Optimal SQ Lower Bounds for Learning Halfspaces with Massart Noise [9.378684220920562]
tightest statistical query (SQ) lower bounds for learnining halfspaces in the presence of Massart noise.
We show that for arbitrary $eta in [0,1/2]$ every SQ algorithm achieving misclassification error better than $eta$ requires queries of superpolynomial accuracy.
arXiv Detail & Related papers (2022-01-24T17:33:19Z) - Logarithmic Regret from Sublinear Hints [76.87432703516942]
We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model.
We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2021-11-09T16:50:18Z) - Nearly Horizon-Free Offline Reinforcement Learning [97.36751930393245]
We revisit offline reinforcement learning on episodic time-homogeneous Markov Decision Processes with $S$ states, $A$ actions and planning horizon $H$.
We obtain the first set of nearly $H$-free sample complexity bounds for evaluation and planning using the empirical MDPs.
arXiv Detail & Related papers (2021-03-25T18:52:17Z) - Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure.
We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds.
We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z) - $Q$-learning with Logarithmic Regret [60.24952657636464]
We prove that an optimistic $Q$-learning enjoys a $mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $T$ is the total number of steps, and $Delta_min$ is the minimum sub-optimality gap.
arXiv Detail & Related papers (2020-06-16T13:01:33Z) - Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors.
We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z) - Logarithmic Regret in Multisecretary and Online Linear Programs with
Continuous Valuations [0.0]
I study how the shadow prices of a linear program that allocates an endowment of $nbeta in mathbbRm$ resources to $n$ customers behave as $n rightarrow infty$.
I use these results to prove that the expected regret in citesLi 2019b online linear program is $Theta(log n)$, both when the customer variable distribution is known upfront and must be learned on the fly.
arXiv Detail & Related papers (2019-12-16T18:43:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.