One Arrow, Two Kills: An Unified Framework for Achieving Optimal Regret
Guarantees in Sleeping Bandits
- URL: http://arxiv.org/abs/2210.14998v1
- Date: Wed, 26 Oct 2022 19:40:06 GMT
- Title: One Arrow, Two Kills: An Unified Framework for Achieving Optimal Regret
Guarantees in Sleeping Bandits
- Authors: Pierre Gaillard, Aadirupa Saha, Soham Dan
- Abstract summary: We address the problem of emphInternal Regret' in emphSleeping Bandits in the fully adversarial setup.
We propose an algorithm that yields sublinear regret in that measure, even for a completely adversarial sequence of losses and availabilities.
- Score: 29.896865106960423
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the problem of \emph{`Internal Regret'} in \emph{Sleeping Bandits}
in the fully adversarial setup, as well as draw connections between different
existing notions of sleeping regrets in the multiarmed bandits (MAB) literature
and consequently analyze the implications: Our first contribution is to propose
the new notion of \emph{Internal Regret} for sleeping MAB. We then proposed an
algorithm that yields sublinear regret in that measure, even for a completely
adversarial sequence of losses and availabilities. We further show that a low
sleeping internal regret always implies a low external regret, and as well as a
low policy regret for iid sequence of losses. The main contribution of this
work precisely lies in unifying different notions of existing regret in
sleeping bandits and understand the implication of one to another. Finally, we
also extend our results to the setting of \emph{Dueling Bandits} (DB)--a
preference feedback variant of MAB, and proposed a reduction to MAB idea to
design a low regret algorithm for sleeping dueling bandits with stochastic
preferences and adversarial availabilities. The efficacy of our algorithms is
justified through empirical evaluations.
Related papers
- Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI.
Agent's goal is to identify the best arm with a prescribed confidence level.
Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z) - Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)
We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.
We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Multi-Armed Bandits with Abstention [62.749500564313834]
We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic element: abstention.
In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the instantaneous reward before observing it.
arXiv Detail & Related papers (2024-02-23T06:27:12Z) - Proportional Response: Contextual Bandits for Simple and Cumulative
Regret Minimization [29.579719765255927]
We propose a new family of efficient bandit algorithms for the contextual bandit setting.
Our algorithms work with any function class, are robust to model misspecification, and can be used in continuous arm settings.
arXiv Detail & Related papers (2023-07-05T08:34:54Z) - Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score.
We propose a rich class of generalized linear dueling bandit models, which cover many existing models.
Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z) - Contextual Bandits with Smooth Regret: Efficient Learning in Continuous
Action Spaces [14.366265951396587]
We design efficient general-purpose contextual bandit algorithms for large -- or even continuous -- action spaces.
We propose a smooth regret notion for contextual bandits, which dominates previously proposed alternatives.
Our algorithms can be used to recover the previous minimax/Pareto optimal guarantees under the standard regret.
arXiv Detail & Related papers (2022-07-12T21:27:09Z) - Complete Policy Regret Bounds for Tallying Bandits [51.039677652803675]
Policy regret is a well established notion of measuring the performance of an online learning algorithm against an adaptive adversary.
We study restrictions on the adversary that enable efficient minimization of the emphcomplete policy regret
We provide an algorithm that w.h.p a complete policy regret guarantee of $tildemathcalO(mKsqrtT)$, where the $tildemathcalO$ notation hides only logarithmic factors.
arXiv Detail & Related papers (2022-04-24T03:10:27Z) - Versatile Dueling Bandits: Best-of-both-World Analyses for Online
Learning from Preferences [28.79598714109439]
We study the problem of $K$-armed dueling bandit for both and adversarial environments.
We first propose a novel reduction from any (general) dueling bandits to multi-armed bandits.
Our algorithm is also the first to achieve an optimal $O(sum_i = 1K fraclog TDelta_i)$ regret bound against the Condorcet-winner benchmark.
arXiv Detail & Related papers (2022-02-14T13:37:23Z) - Experimental Design for Regret Minimization in Linear Bandits [19.8309784360219]
We propose a novel design-based algorithm to minimize regret in online linear and bandits.
We provide state-of-the-art finite time regret guarantees and show that our algorithm can be applied in both the bandit and semi-bandit feedback regime.
arXiv Detail & Related papers (2020-11-01T17:59:19Z) - Adaptive Discretization for Adversarial Lipschitz Bandits [85.39106976861702]
Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces.
A central theme here is the adaptive discretization of the action space, which gradually zooms in'' on the more promising regions.
We provide the first algorithm for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds.
arXiv Detail & Related papers (2020-06-22T16:06:25Z) - The Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms [10.662105162882526]
We investigate a $k$-armed bandit problem in the emphmany-armed regime.
Our findings suggest a new form of emphfree exploration beneficial to greedy algorithms in the many-armed context.
arXiv Detail & Related papers (2020-02-24T08:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.