Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with
Feedback Graphs
- URL: http://arxiv.org/abs/2206.00873v1
- Date: Thu, 2 Jun 2022 05:01:40 GMT
- Title: Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with
Feedback Graphs
- Authors: Shinji Ito, Taira Tsuchiya, Junya Honda
- Abstract summary: This study considers online learning with general directed feedback graphs.
We present best-of-both-worlds algorithms that achieve nearly tight regret bounds for adversarial environments.
- Score: 34.37963000493442
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study considers online learning with general directed feedback graphs.
For this problem, we present best-of-both-worlds algorithms that achieve nearly
tight regret bounds for adversarial environments as well as poly-logarithmic
regret bounds for stochastic environments. As Alon et al. [2015] have shown,
tight regret bounds depend on the structure of the feedback graph:
\textit{strongly observable} graphs yield minimax regret of $\tilde{\Theta}(
\alpha^{1/2} T^{1/2} )$, while \textit{weakly observable} graphs induce minimax
regret of $\tilde{\Theta}( \delta^{1/3} T^{2/3} )$, where $\alpha$ and
$\delta$, respectively, represent the independence number of the graph and the
domination number of a certain portion of the graph. Our proposed algorithm for
strongly observable graphs has a regret bound of $\tilde{O}( \alpha^{1/2}
T^{1/2} ) $ for adversarial environments, as well as of $ {O} ( \frac{\alpha
(\ln T)^3 }{\Delta_{\min}} ) $ for stochastic environments, where
$\Delta_{\min}$ expresses the minimum suboptimality gap. This result resolves
an open question raised by Erez and Koren [2021]. We also provide an algorithm
for weakly observable graphs that achieves a regret bound of $\tilde{O}(
\delta^{1/3}T^{2/3} )$ for adversarial environments and poly-logarithmic regret
for stochastic environments. The proposed algorithms are based on the
follow-the-perturbed-leader approach combined with newly designed update rules
for learning rates.
Related papers
- Online Learning with Feedback Graphs: The True Shape of Regret [82.00098840619847]
We prove that the minimax regret is proportional to $R*$ for any graph and time horizon $T$.
Introducing an intricate exploration strategy, we define the mainAlgorithm algorithm that achieves the minimax optimal regret bound.
arXiv Detail & Related papers (2023-06-05T15:35:00Z) - Learning on the Edge: Online Learning with Stochastic Feedback Graphs [12.83118601099289]
We study an extension where the directed feedback graph is bandit.
In each round every edge in the graph is either realized or not with a distinct probability for each edge.
We derive a more efficient algorithm featuring a dependence on weighted versions of the independence and weak domination numbers.
arXiv Detail & Related papers (2022-10-09T11:21:08Z) - Improved High-Probability Regret for Adversarial Bandits with
Time-Varying Feedback Graphs [62.52390282012508]
We study high-probability regret bounds for adversarial $K$-armed bandits with time-varying feedback graphs over $T$ rounds.
We develop an algorithm that achieves the optimal regret $widetildemathcalO((sum_t=1Talpha_t)1/2+max_tin[T]alpha_t]$ with high probability.
We also develop the first algorithm that achieves the optimal high-probability regret bound for weakly observable graphs.
arXiv Detail & Related papers (2022-10-04T04:36:15Z) - Simultaneously Learning Stochastic and Adversarial Bandits with General
Graph Feedback [15.429356827868514]
We introduce a new trade-off mechanism for exploration and exploitation of general feedback graphs.
We prove the proposed algorithm simultaneously achieves $mathrmpoly-designed log T$ regret in the adversarial setting.
This is the first best-of-both-worlds result for general feedback graphs.
arXiv Detail & Related papers (2022-06-16T04:21:27Z) - A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with
Feedback Graphs [21.563733343861713]
We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set.
We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both and adversarial environments.
arXiv Detail & Related papers (2022-06-01T15:14:32Z) - Adversarial Linear Contextual Bandits with Graph-Structured Side
Observations [80.95090605985042]
A learning agent repeatedly chooses from a set of $K$ actions after being presented with a $d$-dimensional context vector.
The agent incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures.
Two efficient algorithms are developed based on textttEXP3.
arXiv Detail & Related papers (2020-12-10T15:40:07Z) - $Q$-learning with Logarithmic Regret [60.24952657636464]
We prove that an optimistic $Q$-learning enjoys a $mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $T$ is the total number of steps, and $Delta_min$ is the minimum sub-optimality gap.
arXiv Detail & Related papers (2020-06-16T13:01:33Z) - Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors.
We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z) - Near-Optimal Algorithms for Minimax Optimization [115.21519161773287]
The paper presents the first with $tildeO(sqrtkappa_mathbf xkappa_mathbf)$, matching the design on logarithmic factors.
The paper also presents algorithms that match or outperform all existing methods in these settings in terms of complexity.
arXiv Detail & Related papers (2020-02-05T16:49:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.