A Definition of Non-Stationary Bandits
- URL: http://arxiv.org/abs/2302.12202v2
- Date: Fri, 28 Jul 2023 07:50:22 GMT
- Title: A Definition of Non-Stationary Bandits
- Authors: Yueyang Liu, Xu Kuang, Benjamin Van Roy
- Abstract summary: We identify a formal definition of non-stationarity that can consistently distinguish non-stationary bandits from stationary ones.
We show that this definition can ambiguously classify the same bandit as both stationary and non-stationary.
We introduce a formal definition of non-stationary bandits that resolves these issues.
- Score: 12.643821787548154
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the subject of non-stationary bandit learning having attracted much
recent attention, we have yet to identify a formal definition of
non-stationarity that can consistently distinguish non-stationary bandits from
stationary ones. Prior work has characterized non-stationary bandits as bandits
for which the reward distribution changes over time. We demonstrate that this
definition can ambiguously classify the same bandit as both stationary and
non-stationary; this ambiguity arises in the existing definition's dependence
on the latent sequence of reward distributions. Moreover, the definition has
given rise to two widely used notions of regret: the dynamic regret and the
weak regret. These notions are not indicative of qualitative agent performance
in some bandits. Additionally, this definition of non-stationary bandits has
led to the design of agents that explore excessively. We introduce a formal
definition of non-stationary bandits that resolves these issues. Our new
definition provides a unified approach, applicable seamlessly to both Bayesian
and frequentist formulations of bandits. Furthermore, our definition ensures
consistent classification of two bandits offering agents indistinguishable
experiences, categorizing them as either both stationary or both
non-stationary. This advancement provides a more robust framework for
non-stationary bandit learning.
Related papers
- Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting [67.1631453378926]
Graph-Triggered Bandits is a framework to generalize rested and restless bandits.
In this work, we focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs.
arXiv Detail & Related papers (2024-09-09T18:23:07Z) - Causal Bandits: The Pareto Optimal Frontier of Adaptivity, a Reduction to Linear Bandits, and Limitations around Unknown Marginals [28.94461817548213]
We prove upper and matching lower bounds on the possible trade-offs in the performance of learning in conditionally benign and arbitrary environments.
We are the first to obtain instance-dependent bounds for causal bandits, by reducing the problem to the linear bandit setting.
arXiv Detail & Related papers (2024-07-01T04:12:15Z) - Imprecise Multi-Armed Bandits [0.0]
We introduce a novel multi-armed bandit framework, where each arm is associated with a fixed unknown credal set over the space of outcomes.
We then define a notion of regret corresponding to the lower prevision defined by these credal sets.
arXiv Detail & Related papers (2024-05-09T10:58:40Z) - Reproducible Bandits [95.8830340560603]
A policy in the bandit environment is called reproducible if it pulls, with high probability, the exact same sequence of arms in two different executions.
We show that not only do reproducible policies exist, but also they achieve almost the same optimal (non-reproducible) regret bounds in terms of the time horizon.
Our results show that even though randomization is crucial for the exploration-exploitation trade-off, an optimal balance can still be achieved while pulling the exact same arms in two different rounds of executions.
arXiv Detail & Related papers (2022-10-04T20:36:45Z) - Worst-case Performance of Greedy Policies in Bandits with Imperfect
Context Observations [1.370633147306388]
This work considers Greedy reinforcement learning policies that take actions as if the current estimates of the parameter and of the unobserved contexts coincide with the corresponding true values.
We establish that the non-asymptotic worst-case regret grows logarithmically with the time horizon and the failure probability, while it scales linearly with the number of arms.
arXiv Detail & Related papers (2022-04-10T21:27:56Z) - Robust Allocations with Diversity Constraints [65.3799850959513]
We show that the Nash Welfare rule that maximizes product of agent values is uniquely positioned to be robust when diversity constraints are introduced.
We also show that the guarantees achieved by Nash Welfare are nearly optimal within a widely studied class of allocation rules.
arXiv Detail & Related papers (2021-09-30T11:09:31Z) - Learning from History for Byzantine Robust Optimization [52.68913869776858]
Byzantine robustness has received significant attention recently given its importance for distributed learning.
We show that most existing robust aggregation rules may not converge even in the absence of any Byzantine attackers.
arXiv Detail & Related papers (2020-12-18T16:22:32Z) - Unifying Clustered and Non-stationary Bandits [50.12992652938055]
Non-stationary bandits and online clustering of bandits lift the restrictive assumptions in contextual bandits.
We propose test of homogeneity, which seamlessly addresses change detection for non-stationary bandits and cluster identification for online clustering of bandit.
Rigorous regret analysis and extensive empirical evaluations demonstrate the value of our proposed solution.
arXiv Detail & Related papers (2020-09-05T04:58:06Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.