Improved High-Probability Regret for Adversarial Bandits with
Time-Varying Feedback Graphs
- URL: http://arxiv.org/abs/2210.01376v1
- Date: Tue, 4 Oct 2022 04:36:15 GMT
- Title: Improved High-Probability Regret for Adversarial Bandits with
Time-Varying Feedback Graphs
- Authors: Haipeng Luo, Hanghang Tong, Mengxiao Zhang, Yuheng Zhang
- Abstract summary: We study high-probability regret bounds for adversarial $K$-armed bandits with time-varying feedback graphs over $T$ rounds.
We develop an algorithm that achieves the optimal regret $widetildemathcalO((sum_t=1Talpha_t)1/2+max_tin[T]alpha_t]$ with high probability.
We also develop the first algorithm that achieves the optimal high-probability regret bound for weakly observable graphs.
- Score: 62.52390282012508
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study high-probability regret bounds for adversarial $K$-armed bandits
with time-varying feedback graphs over $T$ rounds. For general strongly
observable graphs, we develop an algorithm that achieves the optimal regret
$\widetilde{\mathcal{O}}((\sum_{t=1}^T\alpha_t)^{1/2}+\max_{t\in[T]}\alpha_t)$
with high probability, where $\alpha_t$ is the independence number of the
feedback graph at round $t$. Compared to the best existing result [Neu, 2015]
which only considers graphs with self-loops for all nodes, our result not only
holds more generally, but importantly also removes any $\text{poly}(K)$
dependence that can be prohibitively large for applications such as contextual
bandits. Furthermore, we also develop the first algorithm that achieves the
optimal high-probability regret bound for weakly observable graphs, which even
improves the best expected regret bound of [Alon et al., 2015] by removing the
$\mathcal{O}(\sqrt{KT})$ term with a refined analysis. Our algorithms are based
on the online mirror descent framework, but importantly with an innovative
combination of several techniques. Notably, while earlier works use optimistic
biased loss estimators for achieving high-probability bounds, we find it
important to use a pessimistic one for nodes without self-loop in a strongly
observable graph.
Related papers
- Online Learning with Feedback Graphs: The True Shape of Regret [82.00098840619847]
We prove that the minimax regret is proportional to $R*$ for any graph and time horizon $T$.
Introducing an intricate exploration strategy, we define the mainAlgorithm algorithm that achieves the minimax optimal regret bound.
arXiv Detail & Related papers (2023-06-05T15:35:00Z) - Stochastic Contextual Bandits with Graph-based Contexts [0.0]
We generalize the on-line graph prediction problem to a version of contextual bandit problems.
Our algorithm relies on the optimal bandit algorithm by Zimmert and Seldin[AISTAT'19, JMLR'21].
arXiv Detail & Related papers (2023-05-02T14:51:35Z) - Learning on the Edge: Online Learning with Stochastic Feedback Graphs [12.83118601099289]
We study an extension where the directed feedback graph is bandit.
In each round every edge in the graph is either realized or not with a distinct probability for each edge.
We derive a more efficient algorithm featuring a dependence on weighted versions of the independence and weak domination numbers.
arXiv Detail & Related papers (2022-10-09T11:21:08Z) - Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with
Feedback Graphs [34.37963000493442]
This study considers online learning with general directed feedback graphs.
We present best-of-both-worlds algorithms that achieve nearly tight regret bounds for adversarial environments.
arXiv Detail & Related papers (2022-06-02T05:01:40Z) - A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with
Feedback Graphs [21.563733343861713]
We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set.
We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both and adversarial environments.
arXiv Detail & Related papers (2022-06-01T15:14:32Z) - Adversarial Linear Contextual Bandits with Graph-Structured Side
Observations [80.95090605985042]
A learning agent repeatedly chooses from a set of $K$ actions after being presented with a $d$-dimensional context vector.
The agent incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures.
Two efficient algorithms are developed based on textttEXP3.
arXiv Detail & Related papers (2020-12-10T15:40:07Z) - Hybrid Stochastic-Deterministic Minibatch Proximal Gradient:
Less-Than-Single-Pass Optimization with Nearly Optimal Generalization [83.80460802169999]
We show that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
For loss factors, we prove that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
arXiv Detail & Related papers (2020-09-18T02:18:44Z) - Scalable Deep Generative Modeling for Sparse Graphs [105.60961114312686]
Existing deep neural methods require $Omega(n2)$ complexity by building up the adjacency matrix.
We develop a novel autoregressive model, named BiGG, that utilizes this sparsity to avoid generating the full adjacency matrix.
During training this autoregressive model can be parallelized with $O(log n)$ synchronization stages.
arXiv Detail & Related papers (2020-06-28T04:37:57Z) - Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors.
We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z) - A Closer Look at Small-loss Bounds for Bandits with Graph Feedback [39.78074016649885]
We study small-loss bounds for adversarial multi-armed bandits with graph feedback.
We derive the first small-loss bound for general strongly observable graphs.
We also take the first attempt at deriving small-loss bounds for weakly observable graphs.
arXiv Detail & Related papers (2020-02-02T03:48:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.