Related papers: Generalized Translation and Scale Invariant Online Algorithm for Adversarial Multi-Armed Bandits

Generalized Translation and Scale Invariant Online Algorithm for Adversarial Multi-Armed Bandits

URL: http://arxiv.org/abs/2109.09212v1
Date: Sun, 19 Sep 2021 20:13:59 GMT
Title: Generalized Translation and Scale Invariant Online Algorithm for Adversarial Multi-Armed Bandits
Authors: Kaan Gokcesu, Hakan Gokcesu
Abstract summary: We study the adversarial multi-armed bandit problem and create a completely online algorithmic framework that is invariant under arbitrary translations and scales of the arm losses. Our algorithm works from a universal prediction perspective and the performance measure used is the expected regret against arbitrary arm selection sequences.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the adversarial multi-armed bandit problem and create a completely online algorithmic framework that is invariant under arbitrary translations and scales of the arm losses. We study the expected performance of our algorithm against a generic competition class, which makes it applicable for a wide variety of problem scenarios. Our algorithm works from a universal prediction perspective and the performance measure used is the expected regret against arbitrary arm selection sequences, which is the difference between our losses and a competing loss sequence. The competition class can be designed to include fixed arm selections, switching bandits, contextual bandits, or any other competition of interest. The sequences in the competition class are generally determined by the specific application at hand and should be designed accordingly. Our algorithm neither uses nor needs any preliminary information about the loss sequences and is completely online. Its performance bounds are the second order bounds in terms of sum of the squared losses, where any affine transform of the losses has no effect on the normalized regret.

Related papers

Distributed Online Bandit Nonconvex Optimization with One-Point Residual Feedback via Dynamic Regret [10.700891331004799]
This paper considers the distributed online bandit optimization problem with non loss functions over a time-varying digraph. Players select an adversary and then the adversary assigns an arbitrary non-linear loss function to this player. The expected regret of our algorithms is comparable to existing algorithms that use two-point deviation.
arXiv Detail & Related papers (2024-09-24T02:37:33Z)
Data Dependent Regret Guarantees Against General Comparators for Full or Bandit Feedback [0.0]
We study the adversarial online learning problem and create a completely online algorithmic framework that has data dependent regret guarantees. Our algorithm works from a universal prediction perspective and the performance measure used is the expected regret against arbitrary comparator sequences.
arXiv Detail & Related papers (2023-03-12T00:18:46Z)
Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise. Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret. We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z)
Regret Minimization and Convergence to Equilibria in General-sum Markov Games [57.568118148036376]
We present the first algorithm for learning in general-sum Markov games that provides sublinear regret guarantees when executed by all agents. Our algorithm is decentralized, computationally efficient, and does not require any communication between agents.
arXiv Detail & Related papers (2022-07-28T16:27:59Z)
Second Order Regret Bounds Against Generalized Expert Sequences under Partial Bandit Feedback [0.0]
We study the problem of expert advice under partial bandit feedback setting and create a sequential minimax optimal algorithm. Our algorithm works with a more general partial monitoring setting, where, in contrast to the classical bandit feedback, the losses can be revealed in an adversarial manner.
arXiv Detail & Related papers (2022-04-13T22:48:12Z)
Contextual Model Aggregation for Fast and Robust Federated Learning in Edge Computing [88.76112371510999]
Federated learning is a prime candidate for distributed machine learning at the network edge. Existing algorithms face issues with slow convergence and/or robustness of performance. We propose a contextual aggregation scheme that achieves the optimal context-dependent bound on loss reduction.
arXiv Detail & Related papers (2022-03-23T21:42:31Z)
Bayesian decision-making under misspecified priors with applications to meta-learning [64.38020203019013]
Thompson sampling and other sequential decision-making algorithms are popular approaches to tackle explore/exploit trade-offs in contextual bandits. We show that performance degrades gracefully with misspecified priors.
arXiv Detail & Related papers (2021-07-03T23:17:26Z)
Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z)
Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback [32.62857394584907]
We study the multi-armed bandit (MAB) problem with composite and anonymous feedback. We propose adaptive algorithms for both the adversarial and non- adversarial cases.
arXiv Detail & Related papers (2020-12-13T12:25:41Z)
A Generalized Online Algorithm for Translation and Scale Invariant Prediction with Expert Advice [0.0]
We study the expected regret of our algorithm against a generic competition class in the sequential prediction by expert advice problem. Our regret bounds are stable under arbitrary scalings and translations of the losses.
arXiv Detail & Related papers (2020-09-09T15:45:28Z)
Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles [112.89548995091182]
We provide the first universal and optimal reduction from contextual bandits to online regression. Our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially.
arXiv Detail & Related papers (2020-02-12T11:33:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.