Nested bandits
- URL: http://arxiv.org/abs/2206.09348v1
- Date: Sun, 19 Jun 2022 08:08:38 GMT
- Title: Nested bandits
- Authors: Matthieu Martin and Panayotis Mertikopoulos and Thibaud Rahier and
Houssam Zenati
- Abstract summary: In many online decision processes, the optimizing agent is called to choose between large numbers of alternatives with many inherent similarities.
We propose a nested exponential weights (NEW) algorithm that performs a layered exploration of the learner's set of alternatives.
In so doing, we obtain a series of tight bounds for the learner's regret showing that online learning problems with a high degree of similarity between alternatives can be resolved efficiently.
- Score: 34.793913964978486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many online decision processes, the optimizing agent is called to choose
between large numbers of alternatives with many inherent similarities; in turn,
these similarities imply closely correlated losses that may confound standard
discrete choice models and bandit algorithms. We study this question in the
context of nested bandits, a class of adversarial multi-armed bandit problems
where the learner seeks to minimize their regret in the presence of a large
number of distinct alternatives with a hierarchy of embedded
(non-combinatorial) similarities. In this setting, optimal algorithms based on
the exponential weights blueprint (like Hedge, EXP3, and their variants) may
incur significant regret because they tend to spend excessive amounts of time
exploring irrelevant alternatives with similar, suboptimal costs. To account
for this, we propose a nested exponential weights (NEW) algorithm that performs
a layered exploration of the learner's set of alternatives based on a nested,
step-by-step selection method. In so doing, we obtain a series of tight bounds
for the learner's regret showing that online learning problems with a high
degree of similarity between alternatives can be resolved efficiently, without
a red bus / blue bus paradox occurring.
Related papers
- Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Discrete Choice Multi-Armed Bandits [0.0]
This paper establishes a connection between a category of discrete choice models and the realms of online learning and multiarmed bandit algorithms.
We furnish sublinear regret bounds for a comprehensive family of algorithms, encompassing the Exp3 algorithm as a particular case.
We introduce a novel family of adversarial multiarmed bandit algorithms, drawing inspiration from the generalized nested logit models.
arXiv Detail & Related papers (2023-10-01T03:41:04Z) - An Optimal Algorithm for the Real-Valued Combinatorial Pure Exploration
of Multi-Armed Bandit [65.268245109828]
We study the real-valued pure exploration problem in the multi-armed bandit (R-CPE-MAB)
Existing methods in the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits.
We propose an algorithm named the gap-based exploration (CombGapE) algorithm, whose sample complexity matches the lower bound.
arXiv Detail & Related papers (2023-06-15T15:37:31Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - Stochastic Rising Bandits [40.32303434592863]
We study a particular case of the rested and restless bandits in which the arms' expected payoff is monotonically non-decreasing.
This characteristic allows designing specifically crafted algorithms that exploit the regularity of the payoffs to provide tight regret bounds.
We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset.
arXiv Detail & Related papers (2022-12-07T17:30:45Z) - Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms.
Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z) - Regret Bound Balancing and Elimination for Model Selection in Bandits
and RL [34.15978290083909]
We propose a simple model selection approach for algorithms in bandit and reinforcement learning problems.
We prove that the total regret of this approach is bounded by the best valid candidate regret bound times a multiplicative factor.
Unlike recent efforts in model selection for linear bandits, our approach is versatile enough to also cover cases where the context information is generated by an adversarial environment.
arXiv Detail & Related papers (2020-12-24T00:53:42Z) - Model Selection in Contextual Stochastic Bandit Problems [51.94632035240787]
We develop a meta-algorithm that selects between base algorithms.
We show through a lower bound that even when one of the base algorithms has $O(sqrtT)$ regret, in general it is impossible to get better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2020-03-03T18:46:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.