The Countable-armed Bandit with Vanishing Arms
- URL: http://arxiv.org/abs/2110.12118v1
- Date: Sat, 23 Oct 2021 02:47:55 GMT
- Title: The Countable-armed Bandit with Vanishing Arms
- Authors: Anand Kalvit and Assaf Zeevi
- Abstract summary: We consider a bandit problem with countably many arms partitioned into finitely many "types"
A "non-stationary" distribution governs the relative abundance of each arm-type in the population of arms, aka the "arm-reservoir"
- Score: 8.099977107670918
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a bandit problem with countably many arms, partitioned into
finitely many "types," each characterized by a unique mean reward. A
"non-stationary" distribution governs the relative abundance of each arm-type
in the population of arms, aka the "arm-reservoir." This non-stationarity is
attributable to a probabilistic leakage of "optimal" arms from the reservoir
over time, which we refer to as the "vanishing arms" phenomenon; this induces a
time-varying (potentially "endogenous," policy-dependent) distribution over the
reservoir. The objective is minimization of the expected cumulative regret. We
characterize necessary and sufficient conditions for achievability of
sub-linear regret in terms of a critical vanishing rate of optimal arms. We
also discuss two reservoir distribution-oblivious algorithms that are
long-run-average optimal whenever sub-linear regret is statistically
achievable. Numerical experiments highlight a distinctive characteristic of
this problem related to ex ante knowledge of the "gap" parameter (the
difference between the top two mean rewards): in contrast to the stationary
bandit formulation, regret in our setting may suffer substantial inflation
under adaptive exploration-based (gap-oblivious) algorithms such as UCB
vis-`a-vis their non-adaptive forced exploration-based (gap-aware) counterparts
like ETC.
Related papers
- Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI.
Agent's goal is to identify the best arm with a prescribed confidence level.
Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z) - Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting [67.1631453378926]
Graph-Triggered Bandits is a framework to generalize rested and restless bandits.
In this work, we focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs.
arXiv Detail & Related papers (2024-09-09T18:23:07Z) - Optimal Best Arm Identification with Fixed Confidence in Restless Bandits [66.700654953613]
We study best arm identification in a restless multi-armed bandit setting with finitely many arms.
The discrete-time data generated by each arm forms a homogeneous Markov chain taking values in a common, finite state space.
It is demonstrated that tracking the long-term behavior of a certain Markov decision process and its state-action visitation proportions are the key ingredients in analyzing the converse and achievability bounds.
arXiv Detail & Related papers (2023-10-20T10:04:05Z) - Constrained Pure Exploration Multi-Armed Bandits with a Fixed Budget [4.226118870861363]
We consider a constrained, pure exploration, multi-armed bandit formulation under a fixed budget.
We propose an algorithm called textscConstrained-SR based on the Successive Rejects framework.
We show that the associated decay rate is nearly optimal relative to an information theoretic lower bound in certain special cases.
arXiv Detail & Related papers (2022-11-27T08:58:16Z) - Beyond the Best: Estimating Distribution Functionals in Infinite-Armed
Bandits [40.71199236098642]
In the infinite-armed bandit problem, each arm's average reward is sampled from an unknown distribution.
We consider a general class of distribution functionals beyond the maximum, and propose unified meta algorithms for both the offline and online settings.
arXiv Detail & Related papers (2022-11-01T18:20:10Z) - Reproducible Bandits [95.8830340560603]
A policy in the bandit environment is called reproducible if it pulls, with high probability, the exact same sequence of arms in two different executions.
We show that not only do reproducible policies exist, but also they achieve almost the same optimal (non-reproducible) regret bounds in terms of the time horizon.
Our results show that even though randomization is crucial for the exploration-exploitation trade-off, an optimal balance can still be achieved while pulling the exact same arms in two different rounds of executions.
arXiv Detail & Related papers (2022-10-04T20:36:45Z) - A PDE-Based Analysis of the Symmetric Two-Armed Bernoulli Bandit [1.2183405753834562]
This work addresses a version of the two-armed Bernoulli bandit problem where the sum of the means of the arms is one.
We obtain the leading order terms of the minmax optimal regret and pseudoregret for this problem by associating each of them with a solution of a linear heat equation.
arXiv Detail & Related papers (2022-02-11T17:03:18Z) - From Finite to Countable-Armed Bandits [8.099977107670918]
We consider a bandit problem with countably many arms that belong to a finite set of types.
There is a fixed distribution over types which sets the proportion of each type in the population of arms.
We propose a fully adaptive online learning algorithm that achieves O(log n) distribution-dependent expected cumulative regret after any number of plays n.
arXiv Detail & Related papers (2021-05-22T13:09:50Z) - Combinatorial Blocking Bandits with Stochastic Delays [33.65025386998747]
Recent work has considered natural variations of the multi-armed bandit problem, where the reward of each arm is a special function of the time passed since its last pulling.
In this work, we extend the above model in two directions: (i) We consider the general setting where more than one arms can be played at each round, subject to feasibility constraints.
We provide a tight analysis of the approximation of a natural greedy subset that always plays the maximum expected reward feasible among the available (non-blocked) arms.
When the arms' expected rewards are unknown, we adapt the above algorithm into a bandit, based on
arXiv Detail & Related papers (2021-05-22T02:46:04Z) - Adaptive Discretization for Adversarial Lipschitz Bandits [85.39106976861702]
Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces.
A central theme here is the adaptive discretization of the action space, which gradually zooms in'' on the more promising regions.
We provide the first algorithm for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds.
arXiv Detail & Related papers (2020-06-22T16:06:25Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.