Related papers: Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs

Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs

URL: http://arxiv.org/abs/2210.01376v1
Date: Tue, 4 Oct 2022 04:36:15 GMT
Title: Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs
Authors: Haipeng Luo, Hanghang Tong, Mengxiao Zhang, Yuheng Zhang
Abstract summary: We study high-probability regret bounds for adversarial $K$-armed bandits with time-varying feedback graphs over $T$ rounds. We develop an algorithm that achieves the optimal regret $widetildemathcalO((sum_t=1Talpha_t)1/2+max_tin[T]alpha_t]$ with high probability. We also develop the first algorithm that achieves the optimal high-probability regret bound for weakly observable graphs.
Score: 62.52390282012508
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study high-probability regret bounds for adversarial $K$-armed bandits with time-varying feedback graphs over $T$ rounds. For general strongly observable graphs, we develop an algorithm that achieves the optimal regret $\widetilde{\mathcal{O}}((\sum_{t=1}^T\alpha_t)^{1/2}+\max_{t\in[T]}\alpha_t)$ with high probability, where $\alpha_t$ is the independence number of the feedback graph at round $t$. Compared to the best existing result [Neu, 2015] which only considers graphs with self-loops for all nodes, our result not only holds more generally, but importantly also removes any $\text{poly}(K)$ dependence that can be prohibitively large for applications such as contextual bandits. Furthermore, we also develop the first algorithm that achieves the optimal high-probability regret bound for weakly observable graphs, which even improves the best expected regret bound of [Alon et al., 2015] by removing the $\mathcal{O}(\sqrt{KT})$ term with a refined analysis. Our algorithms are based on the online mirror descent framework, but importantly with an innovative combination of several techniques. Notably, while earlier works use optimistic biased loss estimators for achieving high-probability bounds, we find it important to use a pessimistic one for nodes without self-loop in a strongly observable graph.

Related papers

Online Learning with Feedback Graphs: The True Shape of Regret [82.00098840619847]
We prove that the minimax regret is proportional to $R*$ for any graph and time horizon $T$. Introducing an intricate exploration strategy, we define the mainAlgorithm algorithm that achieves the minimax optimal regret bound.
arXiv Detail & Related papers (2023-06-05T15:35:00Z)
Stochastic Contextual Bandits with Graph-based Contexts [0.0]
We generalize the on-line graph prediction problem to a version of contextual bandit problems. Our algorithm relies on the optimal bandit algorithm by Zimmert and Seldin[AISTAT'19, JMLR'21].
arXiv Detail & Related papers (2023-05-02T14:51:35Z)
Learning on the Edge: Online Learning with Stochastic Feedback Graphs [12.83118601099289]
We study an extension where the directed feedback graph is bandit. In each round every edge in the graph is either realized or not with a distinct probability for each edge. We derive a more efficient algorithm featuring a dependence on weighted versions of the independence and weak domination numbers.
arXiv Detail & Related papers (2022-10-09T11:21:08Z)
Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs [34.37963000493442]
This study considers online learning with general directed feedback graphs. We present best-of-both-worlds algorithms that achieve nearly tight regret bounds for adversarial environments.
arXiv Detail & Related papers (2022-06-02T05:01:40Z)
A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs [21.563733343861713]
We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both and adversarial environments.
arXiv Detail & Related papers (2022-06-01T15:14:32Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z)
Adversarial Linear Contextual Bandits with Graph-Structured Side Observations [80.95090605985042]
A learning agent repeatedly chooses from a set of $K$ actions after being presented with a $d$-dimensional context vector. The agent incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures. Two efficient algorithms are developed based on textttEXP3.
arXiv Detail & Related papers (2020-12-10T15:40:07Z)
Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization [83.80460802169999]
We show that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model. For loss factors, we prove that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
arXiv Detail & Related papers (2020-09-18T02:18:44Z)
Scalable Deep Generative Modeling for Sparse Graphs [105.60961114312686]
Existing deep neural methods require $Omega(n2)$ complexity by building up the adjacency matrix. We develop a novel autoregressive model, named BiGG, that utilizes this sparsity to avoid generating the full adjacency matrix. During training this autoregressive model can be parallelized with $O(log n)$ synchronization stages.
arXiv Detail & Related papers (2020-06-28T04:37:57Z)
Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors. We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z)
A Closer Look at Small-loss Bounds for Bandits with Graph Feedback [39.78074016649885]
We study small-loss bounds for adversarial multi-armed bandits with graph feedback. We derive the first small-loss bound for general strongly observable graphs. We also take the first attempt at deriving small-loss bounds for weakly observable graphs.
arXiv Detail & Related papers (2020-02-02T03:48:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.