Last Switch Dependent Bandits with Monotone Payoff Functions
- URL: http://arxiv.org/abs/2306.00338v1
- Date: Thu, 1 Jun 2023 04:38:32 GMT
- Title: Last Switch Dependent Bandits with Monotone Payoff Functions
- Authors: Ayoub Foussoul, Vineet Goyal, Orestis Papadigenopoulos, Assaf Zeevi
- Abstract summary: We take a step towards understanding the approximability of planning LSD bandits, namely, the (NP-hard) problem of computing an optimal arm-pulling strategy.
In particular, we design the first efficient constant approximation algorithm for the problem and show that, under a natural monotonicity assumption on payoffs, its approximation guarantee almost matches the state-of-the-art.
We develop new tools and insights for this class of problems, including a novel higher-dimensional relaxation and the technique of mirroring the evolution of virtual states.
- Score: 8.860629791560198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In a recent work, Laforgue et al. introduce the model of last switch
dependent (LSD) bandits, in an attempt to capture nonstationary phenomena
induced by the interaction between the player and the environment. Examples
include satiation, where consecutive plays of the same action lead to decreased
performance, or deprivation, where the payoff of an action increases after an
interval of inactivity. In this work, we take a step towards understanding the
approximability of planning LSD bandits, namely, the (NP-hard) problem of
computing an optimal arm-pulling strategy under complete knowledge of the
model. In particular, we design the first efficient constant approximation
algorithm for the problem and show that, under a natural monotonicity
assumption on the payoffs, its approximation guarantee (almost) matches the
state-of-the-art for the special and well-studied class of recharging bandits
(also known as delay-dependent). In this attempt, we develop new tools and
insights for this class of problems, including a novel higher-dimensional
relaxation and the technique of mirroring the evolution of virtual states. We
believe that these novel elements could potentially be used for approaching
richer classes of action-induced nonstationary bandits (e.g., special instances
of restless bandits). In the case where the model parameters are initially
unknown, we develop an online learning adaptation of our algorithm for which we
provide sublinear regret guarantees against its full-information counterpart.
Related papers
- Don't Miss Out on Novelty: Importance of Novel Features for Deep Anomaly
Detection [64.21963650519312]
Anomaly Detection (AD) is a critical task that involves identifying observations that do not conform to a learned model of normality.
We propose a novel approach to AD using explainability to capture such novel features as unexplained observations in the input space.
Our approach establishes a new state-of-the-art across multiple benchmarks, handling diverse anomaly types.
arXiv Detail & Related papers (2023-10-01T21:24:05Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Meta-Learning Adversarial Bandit Algorithms [55.72892209124227]
We study online meta-learning with bandit feedback.
We learn to tune online mirror descent generalization (OMD) with self-concordant barrier regularizers.
arXiv Detail & Related papers (2023-07-05T13:52:10Z) - Stochastic Rising Bandits [40.32303434592863]
We study a particular case of the rested and restless bandits in which the arms' expected payoff is monotonically non-decreasing.
This characteristic allows designing specifically crafted algorithms that exploit the regularity of the payoffs to provide tight regret bounds.
We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset.
arXiv Detail & Related papers (2022-12-07T17:30:45Z) - Non-Stationary Bandits under Recharging Payoffs: Improved Planning with
Sublinear Regret [34.44347218903429]
The multi-armed bandit setting has been recently studied in the non-stationary regime.
Mean payoff of each action is a non-decreasing function of the number of rounds passed since it was last played.
We show how our algorithm can be transformed into a bandit algorithm with sublinear regret.
arXiv Detail & Related papers (2022-05-29T23:55:36Z) - FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories.
We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z) - Break your Bandit Routine with LSD Rewards: a Last Switch Dependent
Analysis of Satiation and Seasonality [6.146046338698175]
We introduce a novel non-stationary bandit problem, where the expected reward of an arm is fully determined by the time elapsed since the arm last took part in a switch of actions.
Our model generalizes previous notions of delay-dependent rewards, and also relaxes most assumptions on the reward function.
We prove an algorithm and prove a bound on its regret with respect to the optimal non-stationary policy.
arXiv Detail & Related papers (2021-10-22T14:53:13Z) - Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via
Online High-Confidence Change-Point Detection [7.685002911021767]
We introduce an algorithm that efficiently learns policies in non-stationary environments.
It analyzes a possibly infinite stream of data and computes, in real-time, high-confidence change-point detection statistics.
We show that (i) this algorithm minimizes the delay until unforeseen changes to a context are detected, thereby allowing for rapid responses.
arXiv Detail & Related papers (2021-05-20T01:57:52Z) - Non-Stationary Latent Bandits [68.21614490603758]
We propose a practical approach for fast personalization to non-stationary users.
The key idea is to frame this problem as a latent bandit, where prototypical models of user behavior are learned offline and the latent state of the user is inferred online.
We propose Thompson sampling algorithms for regret minimization in non-stationary latent bandits, analyze them, and evaluate them on a real-world dataset.
arXiv Detail & Related papers (2020-12-01T10:31:57Z) - Latent Bandits Revisited [55.88616813182679]
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state.
We propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling.
We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions.
arXiv Detail & Related papers (2020-06-15T19:24:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.