Online Learning with Feedback Graphs: The True Shape of Regret
- URL: http://arxiv.org/abs/2306.02971v1
- Date: Mon, 5 Jun 2023 15:35:00 GMT
- Title: Online Learning with Feedback Graphs: The True Shape of Regret
- Authors: Tom\'a\v{s} Koc\'ak and Alexandra Carpentier
- Abstract summary: We prove that the minimax regret is proportional to $R*$ for any graph and time horizon $T$.
Introducing an intricate exploration strategy, we define the mainAlgorithm algorithm that achieves the minimax optimal regret bound.
- Score: 82.00098840619847
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sequential learning with feedback graphs is a natural extension of the
multi-armed bandit problem where the problem is equipped with an underlying
graph structure that provides additional information - playing an action
reveals the losses of all the neighbors of the action. This problem was
introduced by \citet{mannor2011} and received considerable attention in recent
years. It is generally stated in the literature that the minimax regret rate
for this problem is of order $\sqrt{\alpha T}$, where $\alpha$ is the
independence number of the graph, and $T$ is the time horizon. However, this is
proven only when the number of rounds $T$ is larger than $\alpha^3$, which
poses a significant restriction for the usability of this result in large
graphs. In this paper, we define a new quantity $R^*$, called the \emph{problem
complexity}, and prove that the minimax regret is proportional to $R^*$ for any
graph and time horizon $T$. Introducing an intricate exploration strategy, we
define the \mainAlgorithm algorithm that achieves the minimax optimal regret
bound and becomes the first provably optimal algorithm for this setting, even
if $T$ is smaller than $\alpha^3$.
Related papers
- Linear Causal Bandits: Unknown Graph and Soft Interventions [18.412177974475526]
designing causal bandit algorithms depends on two central categories of assumptions.
The problem in its general form, i.e., unknown graph and unknown intervention models, remains open.
This paper addresses this problem and establishes that in a graph with $N$ nodes, maximum in-degree $d$ and maximum causal path length $L$, after $T$ interaction rounds the regret upper bound scales.
arXiv Detail & Related papers (2024-11-04T18:50:39Z) - On Interpolating Experts and Multi-Armed Bandits [1.9497955504539408]
We prove tight minimax regret bounds for $mathbfm$-MAB and design an optimal PAC algorithm for its pure exploration version, $mathbfm$-BAI.
As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.
arXiv Detail & Related papers (2023-07-14T10:38:30Z) - On the Minimax Regret for Online Learning with Feedback Graphs [5.721380617450645]
We improve on the upper and lower bounds for the regret of online learning with strongly observable undirected feedback graphs.
Our improved upper bound $mathcalObigl(sqrtalpha T(ln K)/(lnalpha)bigr)$ holds for any $alpha$ and matches the lower bounds for bandits and experts.
arXiv Detail & Related papers (2023-05-24T17:40:57Z) - Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with
Feedback Graphs [34.37963000493442]
This study considers online learning with general directed feedback graphs.
We present best-of-both-worlds algorithms that achieve nearly tight regret bounds for adversarial environments.
arXiv Detail & Related papers (2022-06-02T05:01:40Z) - A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with
Feedback Graphs [21.563733343861713]
We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set.
We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both and adversarial environments.
arXiv Detail & Related papers (2022-06-01T15:14:32Z) - Logarithmic Regret from Sublinear Hints [76.87432703516942]
We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model.
We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2021-11-09T16:50:18Z) - Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure.
We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds.
We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z) - Adversarial Linear Contextual Bandits with Graph-Structured Side
Observations [80.95090605985042]
A learning agent repeatedly chooses from a set of $K$ actions after being presented with a $d$-dimensional context vector.
The agent incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures.
Two efficient algorithms are developed based on textttEXP3.
arXiv Detail & Related papers (2020-12-10T15:40:07Z) - Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems.
We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z) - Near-optimal Regret Bounds for Stochastic Shortest Path [63.029132134792555]
shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost.
We show that any learning algorithm must have at least $Omega(B_star sqrt|S| |A| K)$ regret in the worst case.
arXiv Detail & Related papers (2020-02-23T09:10:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.