Best-of-Both-Worlds Algorithms for Partial Monitoring
- URL: http://arxiv.org/abs/2207.14550v1
- Date: Fri, 29 Jul 2022 08:40:05 GMT
- Title: Best-of-Both-Worlds Algorithms for Partial Monitoring
- Authors: Taira Tsuchiya, Shinji Ito, Junya Honda
- Abstract summary: We show that for non-degenerate globally observable games, the regret in the regime is bounded by $O(k3 m2 log(k_Pi T) / Delta_min2)$.
Our algorithms are based on the follow-the-regularized-leader framework inspired by algorithms in the field of online learning with feedback graphs.
- Score: 34.37963000493442
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper considers the partial monitoring problem with $k$-actions and
$d$-outcomes and provides the first best-of-both-worlds algorithms, whose
regrets are bounded poly-logarithmically in the stochastic regime and
near-optimally in the adversarial regime. To be more specific, we show that for
non-degenerate locally observable games, the regret in the stochastic regime is
bounded by $O(k^3 m^2 \log(T) \log(k_{\Pi} T) / \Delta_{\mathrm{\min}})$ and in
the adversarial regime by $O(k^{2/3} m \sqrt{T \log(T) \log k_{\Pi}})$, where
$T$ is the number of rounds, $m$ is the maximum number of distinct observations
per action, $\Delta_{\min}$ is the minimum optimality gap, and $k_{\Pi}$ is the
number of Pareto optimal actions. Moreover, we show that for non-degenerate
globally observable games, the regret in the stochastic regime is bounded by
$O(\max\{c_{\mathcal{G}}^2 / k,\, c_{\mathcal{G}}\} \log(T) \log(k_{\Pi} T) /
\Delta_{\min}^2)$ and in the adversarial regime by $O((\max\{c_{\mathcal{G}}^2
/ k,\, c_{\mathcal{G}}\} \log(T) \log(k_{\Pi} T)))^{1/3} T^{2/3})$, where
$c_{\mathcal{G}}$ is a game-dependent constant. Our algorithms are based on the
follow-the-regularized-leader framework that takes into account the nature of
the partial monitoring problem, inspired by algorithms in the field of online
learning with feedback graphs.
Related papers
- Online Newton Method for Bandit Convex Optimisation [28.66596225688161]
We introduce a computationally efficient algorithm for zeroth-order bandit convex optimisation.
We prove that in the adversarial setting its regret is at most $d3.5 sqrtn mathrmpolylog(n, d)$ with high probability where $d$ is the time horizon.
In the setting the bound improves to $M d2 sqrtn mathrmpolylog(n, d)$ where $M in [d-1/2, d-1 / 4]$ is
arXiv Detail & Related papers (2024-06-10T17:44:11Z) - Batched Stochastic Bandit for Nondegenerate Functions [8.015503209312786]
This paper studies batched bandit learning problems for nondegenerate functions.
We introduce an algorithm that solves the batched bandit problem for nondegenerate functions near-optimally.
arXiv Detail & Related papers (2024-05-09T12:50:16Z) - Best-of-Both-Worlds Algorithms for Linear Contextual Bandits [11.94312915280916]
We study best-of-both-worlds algorithms for $K$-armed linear contextual bandits.
Our algorithms deliver near-optimal regret bounds in both the adversarial and adversarial regimes.
arXiv Detail & Related papers (2023-12-24T08:27:30Z) - Dueling Optimization with a Monotone Adversary [35.850072415395395]
We study the problem of dueling optimization with a monotone adversary, which is a generalization of (noiseless) dueling convex optimization.
The goal is to design an online algorithm to find a minimizer $mathbfx*$ for a function $fcolon X to mathbbRd.
arXiv Detail & Related papers (2023-11-18T23:55:59Z) - Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning [54.806166861456035]
We study the episodic reinforcement learning (RL) problem modeled by finite-horizon Markov Decision Processes (MDPs) with constraint on the number of batches.
We design a computational efficient algorithm to achieve near-optimal regret of $tildeO(sqrtSAH3Kln (1/delta))$tildeO(cdot) hides logarithmic terms of $(S,A,H,K)$ in $K$ episodes.
Our technical contribution are two-fold: 1) a near-optimal design scheme to explore
arXiv Detail & Related papers (2022-10-15T09:22:22Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure.
We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds.
We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z) - Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso bandit is an algorithm that estimates the vector defining the reward function as well as its sparse support.
We establish non-asymptotic regret upper bounds scaling as $mathcalO( log d + sqrtT )$ in general, and as $mathcalO( log d + sqrtT )$ under the so-called margin condition.
arXiv Detail & Related papers (2020-10-22T19:14:37Z) - Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors.
We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z) - Tight Regret Bounds for Noisy Optimization of a Brownian Motion [118.65407541895851]
We consider the problem of Bayesian optimization of a one-dimensional Brownian motion in which the $T$ adaptively chosen observations are corrupted by Gaussian noise.
We show that as the smallest possible expected cumulative regret and the smallest possible expected simple regret scale as $Omega(sigmasqrtT cdot log T)$.
arXiv Detail & Related papers (2020-01-25T14:44:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.