A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with
Feedback Graphs
- URL: http://arxiv.org/abs/2206.00557v1
- Date: Wed, 1 Jun 2022 15:14:32 GMT
- Title: A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with
Feedback Graphs
- Authors: Chlo\'e Rouyer, Dirk van der Hoeven, Nicol\`o Cesa-Bianchi and Yevgeny
Seldin
- Abstract summary: We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set.
We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both and adversarial environments.
- Score: 21.563733343861713
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider online learning with feedback graphs, a sequential
decision-making framework where the learner's feedback is determined by a
directed graph over the action set. We present a computationally efficient
algorithm for learning in this framework that simultaneously achieves
near-optimal regret bounds in both stochastic and adversarial environments. The
bound against oblivious adversaries is $\tilde{O} (\sqrt{\alpha T})$, where $T$
is the time horizon and $\alpha$ is the independence number of the feedback
graph. The bound against stochastic environments is $O\big( (\ln T)^2
\max_{S\in \mathcal I(G)} \sum_{i \in S} \Delta_i^{-1}\big)$ where $\mathcal
I(G)$ is the family of all independent sets in a suitably defined undirected
version of the graph and $\Delta_i$ are the suboptimality gaps. The algorithm
combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits
and the EXP3.G algorithm for feedback graphs with a novel exploration scheme.
The scheme, which exploits the structure of the graph to reduce exploration, is
key to obtain best-of-both-worlds guarantees with feedback graphs. We also
extend our algorithm and results to a setting where the feedback graphs are
allowed to change over time.
Related papers
- Graph Unfolding and Sampling for Transitory Video Summarization via Gershgorin Disc Alignment [48.137527345353625]
User-generated videos (UGVs) uploaded from mobile phones to social media sites like YouTube and TikTok are short and non-repetitive.
We summarize a transitory UGV into several discs in linear time via fast graph sampling based on Gershgorin disc alignment (GDA)
We show that our algorithm achieves comparable or better video summarization performance compared to state-of-the-art methods, at a substantially reduced complexity.
arXiv Detail & Related papers (2024-08-03T20:08:02Z) - Online Learning with Feedback Graphs: The True Shape of Regret [82.00098840619847]
We prove that the minimax regret is proportional to $R*$ for any graph and time horizon $T$.
Introducing an intricate exploration strategy, we define the mainAlgorithm algorithm that achieves the minimax optimal regret bound.
arXiv Detail & Related papers (2023-06-05T15:35:00Z) - Learning on the Edge: Online Learning with Stochastic Feedback Graphs [12.83118601099289]
We study an extension where the directed feedback graph is bandit.
In each round every edge in the graph is either realized or not with a distinct probability for each edge.
We derive a more efficient algorithm featuring a dependence on weighted versions of the independence and weak domination numbers.
arXiv Detail & Related papers (2022-10-09T11:21:08Z) - Improved High-Probability Regret for Adversarial Bandits with
Time-Varying Feedback Graphs [62.52390282012508]
We study high-probability regret bounds for adversarial $K$-armed bandits with time-varying feedback graphs over $T$ rounds.
We develop an algorithm that achieves the optimal regret $widetildemathcalO((sum_t=1Talpha_t)1/2+max_tin[T]alpha_t]$ with high probability.
We also develop the first algorithm that achieves the optimal high-probability regret bound for weakly observable graphs.
arXiv Detail & Related papers (2022-10-04T04:36:15Z) - Simultaneously Learning Stochastic and Adversarial Bandits with General
Graph Feedback [15.429356827868514]
We introduce a new trade-off mechanism for exploration and exploitation of general feedback graphs.
We prove the proposed algorithm simultaneously achieves $mathrmpoly-designed log T$ regret in the adversarial setting.
This is the first best-of-both-worlds result for general feedback graphs.
arXiv Detail & Related papers (2022-06-16T04:21:27Z) - Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with
Feedback Graphs [34.37963000493442]
This study considers online learning with general directed feedback graphs.
We present best-of-both-worlds algorithms that achieve nearly tight regret bounds for adversarial environments.
arXiv Detail & Related papers (2022-06-02T05:01:40Z) - Contextual Recommendations and Low-Regret Cutting-Plane Algorithms [49.91214213074933]
We consider the following variant of contextual linear bandits motivated by routing applications in navigational engines and recommendation systems.
We design novel cutting-plane algorithms with low "regret" -- the total distance between the true point $w*$ and the hyperplanes the separation oracle returns.
arXiv Detail & Related papers (2021-06-09T05:39:05Z) - Learnable Graph Matching: Incorporating Graph Partitioning with Deep
Feature Learning for Multiple Object Tracking [58.30147362745852]
Data association across frames is at the core of Multiple Object Tracking (MOT) task.
Existing methods mostly ignore the context information among tracklets and intra-frame detections.
We propose a novel learnable graph matching method to address these issues.
arXiv Detail & Related papers (2021-03-30T08:58:45Z) - Random Graph Matching with Improved Noise Robustness [2.294014185517203]
We propose a new algorithm for graph matching under probabilistic models.
Our algorithm recovers the underlying matching with high probability when $alpha le 1 / (log log n)C$.
This improves the condition $alpha le 1 / (log n)C$ achieved in previous work.
arXiv Detail & Related papers (2021-01-28T02:39:27Z) - Adversarial Linear Contextual Bandits with Graph-Structured Side
Observations [80.95090605985042]
A learning agent repeatedly chooses from a set of $K$ actions after being presented with a $d$-dimensional context vector.
The agent incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures.
Two efficient algorithms are developed based on textttEXP3.
arXiv Detail & Related papers (2020-12-10T15:40:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.