Data Dependent Regret Guarantees Against General Comparators for Full or
Bandit Feedback
- URL: http://arxiv.org/abs/2303.06526v1
- Date: Sun, 12 Mar 2023 00:18:46 GMT
- Title: Data Dependent Regret Guarantees Against General Comparators for Full or
Bandit Feedback
- Authors: Kaan Gokcesu, Hakan Gokcesu
- Abstract summary: We study the adversarial online learning problem and create a completely online algorithmic framework that has data dependent regret guarantees.
Our algorithm works from a universal prediction perspective and the performance measure used is the expected regret against arbitrary comparator sequences.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the adversarial online learning problem and create a completely
online algorithmic framework that has data dependent regret guarantees in both
full expert feedback and bandit feedback settings. We study the expected
performance of our algorithm against general comparators, which makes it
applicable for a wide variety of problem scenarios. Our algorithm works from a
universal prediction perspective and the performance measure used is the
expected regret against arbitrary comparator sequences, which is the difference
between our losses and a competing loss sequence. The competition class can be
designed to include fixed arm selections, switching bandits, contextual
bandits, periodic bandits or any other competition of interest. The sequences
in the competition class are generally determined by the specific application
at hand and should be designed accordingly. Our algorithm neither uses nor
needs any preliminary information about the loss sequences and is completely
online. Its performance bounds are data dependent, where any affine transform
of the losses has no effect on the normalized regret.
Related papers
- Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Optimal Tracking in Prediction with Expert Advice [0.0]
We study the prediction with expert advice setting, where the aim is to produce a decision by combining the decisions generated by a set of experts.
We achieve the min-max optimal dynamic regret under the prediction with expert advice setting.
Our algorithms are the first to produce such universally optimal, adaptive and truly online guarantees with no prior knowledge.
arXiv Detail & Related papers (2022-08-07T12:29:54Z) - Second Order Regret Bounds Against Generalized Expert Sequences under
Partial Bandit Feedback [0.0]
We study the problem of expert advice under partial bandit feedback setting and create a sequential minimax optimal algorithm.
Our algorithm works with a more general partial monitoring setting, where, in contrast to the classical bandit feedback, the losses can be revealed in an adversarial manner.
arXiv Detail & Related papers (2022-04-13T22:48:12Z) - Contextual Model Aggregation for Fast and Robust Federated Learning in
Edge Computing [88.76112371510999]
Federated learning is a prime candidate for distributed machine learning at the network edge.
Existing algorithms face issues with slow convergence and/or robustness of performance.
We propose a contextual aggregation scheme that achieves the optimal context-dependent bound on loss reduction.
arXiv Detail & Related papers (2022-03-23T21:42:31Z) - Generalized Translation and Scale Invariant Online Algorithm for
Adversarial Multi-Armed Bandits [0.0]
We study the adversarial multi-armed bandit problem and create a completely online algorithmic framework that is invariant under arbitrary translations and scales of the arm losses.
Our algorithm works from a universal prediction perspective and the performance measure used is the expected regret against arbitrary arm selection sequences.
arXiv Detail & Related papers (2021-09-19T20:13:59Z) - Efficient First-Order Contextual Bandits: Prediction, Allocation, and
Triangular Discrimination [82.52105963476703]
A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems with low noise.
First-order guarantees are relatively well understood in statistical and online learning.
We show that the logarithmic loss and an information-theoretic quantity called the triangular discrimination play a fundamental role in obtaining first-order guarantees.
arXiv Detail & Related papers (2021-07-05T19:20:34Z) - Double Coverage with Machine-Learned Advice [100.23487145400833]
We study the fundamental online $k$-server problem in a learning-augmented setting.
We show that our algorithm achieves for any k an almost optimal consistency-robustness tradeoff.
arXiv Detail & Related papers (2021-03-02T11:04:33Z) - Adapting to misspecification in contextual bandits with offline
regression oracles [7.312170216336086]
We propose a family of contextual bandit algorithms that adapt to misspecification error by reverting to a good safe policy.
Our algorithm requires only an offline regression oracle to ensure regret guarantees that gracefully degrade in terms of a measure of the average misspecification level.
arXiv Detail & Related papers (2021-02-26T00:15:04Z) - Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z) - A Generalized Online Algorithm for Translation and Scale Invariant
Prediction with Expert Advice [0.0]
We study the expected regret of our algorithm against a generic competition class in the sequential prediction by expert advice problem.
Our regret bounds are stable under arbitrary scalings and translations of the losses.
arXiv Detail & Related papers (2020-09-09T15:45:28Z) - Beyond UCB: Optimal and Efficient Contextual Bandits with Regression
Oracles [112.89548995091182]
We provide the first universal and optimal reduction from contextual bandits to online regression.
Our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially.
arXiv Detail & Related papers (2020-02-12T11:33:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.