Learning on the Edge: Online Learning with Stochastic Feedback Graphs
- URL: http://arxiv.org/abs/2210.04229v1
- Date: Sun, 9 Oct 2022 11:21:08 GMT
- Title: Learning on the Edge: Online Learning with Stochastic Feedback Graphs
- Authors: Emmanuel Esposito, Federico Fusco, Dirk van der Hoeven, Nicol\`o
Cesa-Bianchi
- Abstract summary: We study an extension where the directed feedback graph is bandit.
In each round every edge in the graph is either realized or not with a distinct probability for each edge.
We derive a more efficient algorithm featuring a dependence on weighted versions of the independence and weak domination numbers.
- Score: 12.83118601099289
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The framework of feedback graphs is a generalization of sequential
decision-making with bandit or full information feedback. In this work, we
study an extension where the directed feedback graph is stochastic, following a
distribution similar to the classical Erd\H{o}s-R\'enyi model. Specifically, in
each round every edge in the graph is either realized or not with a distinct
probability for each edge. We prove nearly optimal regret bounds of order
$\min\bigl\{\min_{\varepsilon} \sqrt{(\alpha_\varepsilon/\varepsilon) T},\,
\min_{\varepsilon} (\delta_\varepsilon/\varepsilon)^{1/3} T^{2/3}\bigr\}$
(ignoring logarithmic factors), where $\alpha_{\varepsilon}$ and
$\delta_{\varepsilon}$ are graph-theoretic quantities measured on the support
of the stochastic feedback graph $\mathcal{G}$ with edge probabilities
thresholded at $\varepsilon$. Our result, which holds without any preliminary
knowledge about $\mathcal{G}$, requires the learner to observe only the
realized out-neighborhood of the chosen action. When the learner is allowed to
observe the realization of the entire graph (but only the losses in the
out-neighborhood of the chosen action), we derive a more efficient algorithm
featuring a dependence on weighted versions of the independence and weak
domination numbers that exhibits improved bounds for some special cases.
Related papers
- Online Learning with Feedback Graphs: The True Shape of Regret [82.00098840619847]
We prove that the minimax regret is proportional to $R*$ for any graph and time horizon $T$.
Introducing an intricate exploration strategy, we define the mainAlgorithm algorithm that achieves the minimax optimal regret bound.
arXiv Detail & Related papers (2023-06-05T15:35:00Z) - Improved High-Probability Regret for Adversarial Bandits with
Time-Varying Feedback Graphs [62.52390282012508]
We study high-probability regret bounds for adversarial $K$-armed bandits with time-varying feedback graphs over $T$ rounds.
We develop an algorithm that achieves the optimal regret $widetildemathcalO((sum_t=1Talpha_t)1/2+max_tin[T]alpha_t]$ with high probability.
We also develop the first algorithm that achieves the optimal high-probability regret bound for weakly observable graphs.
arXiv Detail & Related papers (2022-10-04T04:36:15Z) - Efficient Signed Graph Sampling via Balancing & Gershgorin Disc Perfect
Alignment [51.74913666829224]
We show that for datasets with strong inherent anti-correlations, a suitable graph contains both positive and negative edge weights.
We propose a linear-time signed graph sampling method centered on the concept of balanced signed graphs.
Experimental results show that our signed graph sampling method outperformed existing fast sampling schemes noticeably on various datasets.
arXiv Detail & Related papers (2022-08-18T09:19:01Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Simultaneously Learning Stochastic and Adversarial Bandits with General
Graph Feedback [15.429356827868514]
We introduce a new trade-off mechanism for exploration and exploitation of general feedback graphs.
We prove the proposed algorithm simultaneously achieves $mathrmpoly-designed log T$ regret in the adversarial setting.
This is the first best-of-both-worlds result for general feedback graphs.
arXiv Detail & Related papers (2022-06-16T04:21:27Z) - Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with
Feedback Graphs [34.37963000493442]
This study considers online learning with general directed feedback graphs.
We present best-of-both-worlds algorithms that achieve nearly tight regret bounds for adversarial environments.
arXiv Detail & Related papers (2022-06-02T05:01:40Z) - A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with
Feedback Graphs [21.563733343861713]
We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set.
We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both and adversarial environments.
arXiv Detail & Related papers (2022-06-01T15:14:32Z) - Improved Reconstruction of Random Geometric Graphs [3.930410971186142]
We consider the classic model of random geometric graphs where $n$ points are scattered uniformly in a square of area $n$.
We use a hybrid of graph distances and short-range estimates based on the number of common neighbors to estimate Euclidean distances.
Our method estimates Euclidean distances using a hybrid of graph distances and short-range estimates based on the number of common neighbors.
arXiv Detail & Related papers (2021-07-29T20:37:28Z) - Adversarial Linear Contextual Bandits with Graph-Structured Side
Observations [80.95090605985042]
A learning agent repeatedly chooses from a set of $K$ actions after being presented with a $d$-dimensional context vector.
The agent incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures.
Two efficient algorithms are developed based on textttEXP3.
arXiv Detail & Related papers (2020-12-10T15:40:07Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.