Related papers: Simultaneously Learning Stochastic and Adversarial Bandits with General Graph Feedback

Simultaneously Learning Stochastic and Adversarial Bandits with General Graph Feedback

URL: http://arxiv.org/abs/2206.07908v1
Date: Thu, 16 Jun 2022 04:21:27 GMT
Title: Simultaneously Learning Stochastic and Adversarial Bandits with General Graph Feedback
Authors: Fang Kong, Yichi Zhou, Shuai Li
Abstract summary: We introduce a new trade-off mechanism for exploration and exploitation of general feedback graphs. We prove the proposed algorithm simultaneously achieves $mathrmpoly-designed log T$ regret in the adversarial setting. This is the first best-of-both-worlds result for general feedback graphs.
Score: 15.429356827868514
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The problem of online learning with graph feedback has been extensively studied in the literature due to its generality and potential to model various learning tasks. Existing works mainly study the adversarial and stochastic feedback separately. If the prior knowledge of the feedback mechanism is unavailable or wrong, such specially designed algorithms could suffer great loss. To avoid this problem, \citet{erez2021towards} try to optimize for both environments. However, they assume the feedback graphs are undirected and each vertex has a self-loop, which compromises the generality of the framework and may not be satisfied in applications. With a general feedback graph, the observation of an arm may not be available when this arm is pulled, which makes the exploration more expensive and the algorithms more challenging to perform optimally in both environments. In this work, we overcome this difficulty by a new trade-off mechanism with a carefully-designed proportion for exploration and exploitation. We prove the proposed algorithm simultaneously achieves $\mathrm{poly} \log T$ regret in the stochastic setting and minimax-optimal regret of $\tilde{O}(T^{2/3})$ in the adversarial setting where $T$ is the horizon and $\tilde{O}$ hides parameters independent of $T$ as well as logarithmic terms. To our knowledge, this is the first best-of-both-worlds result for general feedback graphs.

Related papers

Efficient Contextual Bandits with Uninformed Feedback Graphs [48.77120088347271]
Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems. We show that it is critical to learn the graphs using log loss instead of squared loss to obtain favorable regret guarantees.
arXiv Detail & Related papers (2024-02-12T23:50:47Z)
Online Learning with Feedback Graphs: The True Shape of Regret [82.00098840619847]
We prove that the minimax regret is proportional to $R*$ for any graph and time horizon $T$. Introducing an intricate exploration strategy, we define the mainAlgorithm algorithm that achieves the minimax optimal regret bound.
arXiv Detail & Related papers (2023-06-05T15:35:00Z)
Learning on the Edge: Online Learning with Stochastic Feedback Graphs [12.83118601099289]
We study an extension where the directed feedback graph is bandit. In each round every edge in the graph is either realized or not with a distinct probability for each edge. We derive a more efficient algorithm featuring a dependence on weighted versions of the independence and weak domination numbers.
arXiv Detail & Related papers (2022-10-09T11:21:08Z)
Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs [62.52390282012508]
We study high-probability regret bounds for adversarial $K$-armed bandits with time-varying feedback graphs over $T$ rounds. We develop an algorithm that achieves the optimal regret $widetildemathcalO((sum_t=1Talpha_t)1/2+max_tin[T]alpha_t]$ with high probability. We also develop the first algorithm that achieves the optimal high-probability regret bound for weakly observable graphs.
arXiv Detail & Related papers (2022-10-04T04:36:15Z)
Multi-armed Bandit Learning on a Graph [0.0]
We study an extension of MAB called the graph bandit, where an agent travels over a graph to maximize the reward collected from different nodes. We design a learning algorithm, G-UCB, that balances long-term exploration-exploitation using the principle of optimism. Our proposed algorithm achieves $O(sqrt|S|Tlog(T)+D|S|log T)$ learning regret, where $|S|$ is the number of nodes and $D$ is the diameter of the graph.
arXiv Detail & Related papers (2022-09-20T02:31:42Z)
Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs [34.37963000493442]
This study considers online learning with general directed feedback graphs. We present best-of-both-worlds algorithms that achieve nearly tight regret bounds for adversarial environments.
arXiv Detail & Related papers (2022-06-02T05:01:40Z)
A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs [21.563733343861713]
We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both and adversarial environments.
arXiv Detail & Related papers (2022-06-01T15:14:32Z)
Learnable Graph Matching: Incorporating Graph Partitioning with Deep Feature Learning for Multiple Object Tracking [58.30147362745852]
Data association across frames is at the core of Multiple Object Tracking (MOT) task. Existing methods mostly ignore the context information among tracklets and intra-frame detections. We propose a novel learnable graph matching method to address these issues.
arXiv Detail & Related papers (2021-03-30T08:58:45Z)
Adversarial Linear Contextual Bandits with Graph-Structured Side Observations [80.95090605985042]
A learning agent repeatedly chooses from a set of $K$ actions after being presented with a $d$-dimensional context vector. The agent incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures. Two efficient algorithms are developed based on textttEXP3.
arXiv Detail & Related papers (2020-12-10T15:40:07Z)
Online Dense Subgraph Discovery via Blurred-Graph Feedback [87.9850024070244]
We introduce a novel learning problem for dense subgraph discovery. We first propose a edge-time algorithm that obtains a nearly-optimal solution with high probability. We then design a more scalable algorithm with a theoretical guarantee.
arXiv Detail & Related papers (2020-06-24T11:37:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.