Related papers: Online Learning with Feedback Graphs: The True Shape of Regret

Online Learning with Feedback Graphs: The True Shape of Regret

URL: http://arxiv.org/abs/2306.02971v1
Date: Mon, 5 Jun 2023 15:35:00 GMT
Title: Online Learning with Feedback Graphs: The True Shape of Regret
Authors: Tom\'a\v{s} Koc\'ak and Alexandra Carpentier
Abstract summary: We prove that the minimax regret is proportional to $R*$ for any graph and time horizon $T$. Introducing an intricate exploration strategy, we define the mainAlgorithm algorithm that achieves the minimax optimal regret bound.
Score: 82.00098840619847
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sequential learning with feedback graphs is a natural extension of the multi-armed bandit problem where the problem is equipped with an underlying graph structure that provides additional information - playing an action reveals the losses of all the neighbors of the action. This problem was introduced by \citet{mannor2011} and received considerable attention in recent years. It is generally stated in the literature that the minimax regret rate for this problem is of order $\sqrt{\alpha T}$, where $\alpha$ is the independence number of the graph, and $T$ is the time horizon. However, this is proven only when the number of rounds $T$ is larger than $\alpha^3$, which poses a significant restriction for the usability of this result in large graphs. In this paper, we define a new quantity $R^*$, called the \emph{problem complexity}, and prove that the minimax regret is proportional to $R^*$ for any graph and time horizon $T$. Introducing an intricate exploration strategy, we define the \mainAlgorithm algorithm that achieves the minimax optimal regret bound and becomes the first provably optimal algorithm for this setting, even if $T$ is smaller than $\alpha^3$.

Related papers

Efficient Near-Optimal Algorithm for Online Shortest Paths in Directed Acyclic Graphs with Bandit Feedback Against Adaptive Adversaries [34.38978643261337]
We study the online shortest path problem in directed acyclic graphs (DAGs) under bandit feedback against an adaptive adversary. We propose the first computationally efficient algorithm to achieve a near-minimax optimal regret bound of $tilde O(sqrt|E|Tlog |X|)$ with high probability against any adaptive adversary.
arXiv Detail & Related papers (2025-04-01T06:35:42Z)
Linear Causal Bandits: Unknown Graph and Soft Interventions [18.412177974475526]
designing causal bandit algorithms depends on two central categories of assumptions. The problem in its general form, i.e., unknown graph and unknown intervention models, remains open. This paper addresses this problem and establishes that in a graph with $N$ nodes, maximum in-degree $d$ and maximum causal path length $L$, after $T$ interaction rounds the regret upper bound scales.
arXiv Detail & Related papers (2024-11-04T18:50:39Z)
On Interpolating Experts and Multi-Armed Bandits [1.9497955504539408]
We prove tight minimax regret bounds for $mathbfm$-MAB and design an optimal PAC algorithm for its pure exploration version, $mathbfm$-BAI. As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.
arXiv Detail & Related papers (2023-07-14T10:38:30Z)
On the Minimax Regret for Online Learning with Feedback Graphs [5.721380617450645]
We improve on the upper and lower bounds for the regret of online learning with strongly observable undirected feedback graphs. Our improved upper bound $mathcalObigl(sqrtalpha T(ln K)/(lnalpha)bigr)$ holds for any $alpha$ and matches the lower bounds for bandits and experts.
arXiv Detail & Related papers (2023-05-24T17:40:57Z)
Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs [34.37963000493442]
This study considers online learning with general directed feedback graphs. We present best-of-both-worlds algorithms that achieve nearly tight regret bounds for adversarial environments.
arXiv Detail & Related papers (2022-06-02T05:01:40Z)
A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs [21.563733343861713]
We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both and adversarial environments.
arXiv Detail & Related papers (2022-06-01T15:14:32Z)
Logarithmic Regret from Sublinear Hints [76.87432703516942]
We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model. We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2021-11-09T16:50:18Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure. We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds. We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z)
Adversarial Linear Contextual Bandits with Graph-Structured Side Observations [80.95090605985042]
A learning agent repeatedly chooses from a set of $K$ actions after being presented with a $d$-dimensional context vector. The agent incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures. Two efficient algorithms are developed based on textttEXP3.
arXiv Detail & Related papers (2020-12-10T15:40:07Z)
Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems. We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z)
Near-optimal Regret Bounds for Stochastic Shortest Path [63.029132134792555]
shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost. We show that any learning algorithm must have at least $Omega(B_star sqrt|S| |A| K)$ regret in the worst case.
arXiv Detail & Related papers (2020-02-23T09:10:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.