Discrete Choice Multi-Armed Bandits
- URL: http://arxiv.org/abs/2310.00562v1
- Date: Sun, 1 Oct 2023 03:41:04 GMT
- Title: Discrete Choice Multi-Armed Bandits
- Authors: Emerson Melo and David M\"uller
- Abstract summary: This paper establishes a connection between a category of discrete choice models and the realms of online learning and multiarmed bandit algorithms.
We furnish sublinear regret bounds for a comprehensive family of algorithms, encompassing the Exp3 algorithm as a particular case.
We introduce a novel family of adversarial multiarmed bandit algorithms, drawing inspiration from the generalized nested logit models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper establishes a connection between a category of discrete choice
models and the realms of online learning and multiarmed bandit algorithms. Our
contributions can be summarized in two key aspects. Firstly, we furnish
sublinear regret bounds for a comprehensive family of algorithms, encompassing
the Exp3 algorithm as a particular case. Secondly, we introduce a novel family
of adversarial multiarmed bandit algorithms, drawing inspiration from the
generalized nested logit models initially introduced by \citet{wen:2001}. These
algorithms offer users the flexibility to fine-tune the model extensively, as
they can be implemented efficiently due to their closed-form sampling
distribution probabilities. To demonstrate the practical implementation of our
algorithms, we present numerical experiments, focusing on the stochastic bandit
case.
Related papers
- Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - GBOSE: Generalized Bandit Orthogonalized Semiparametric Estimation [3.441021278275805]
We propose a new algorithm with a semi-parametric reward model with state-of-the-art complexity of upper bound on regret.
Our work expands the scope of another representative algorithm of state-of-the-art complexity with a similar reward model by proposing an algorithm built upon the same action filtering procedures.
We derive the said complexity of the upper bound on regret and present simulation results that affirm our methods superiority out of all prevalent semi-parametric bandit algorithms for cases involving over two arms.
arXiv Detail & Related papers (2023-01-20T19:39:10Z) - Stochastic Rising Bandits [40.32303434592863]
We study a particular case of the rested and restless bandits in which the arms' expected payoff is monotonically non-decreasing.
This characteristic allows designing specifically crafted algorithms that exploit the regularity of the payoffs to provide tight regret bounds.
We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset.
arXiv Detail & Related papers (2022-12-07T17:30:45Z) - Nested bandits [34.793913964978486]
In many online decision processes, the optimizing agent is called to choose between large numbers of alternatives with many inherent similarities.
We propose a nested exponential weights (NEW) algorithm that performs a layered exploration of the learner's set of alternatives.
In so doing, we obtain a series of tight bounds for the learner's regret showing that online learning problems with a high degree of similarity between alternatives can be resolved efficiently.
arXiv Detail & Related papers (2022-06-19T08:08:38Z) - On Kernelized Multi-Armed Bandits with Constraints [16.102401271318012]
We study a bandit problem with a general unknown reward function and a general unknown constraint function.
We propose a general framework for both algorithm performance analysis.
We demonstrate the superior performance of our proposed algorithms via numerical experiments.
arXiv Detail & Related papers (2022-03-29T14:02:03Z) - Universal and data-adaptive algorithms for model selection in linear
contextual bandits [52.47796554359261]
We consider the simplest non-trivial instance of model-selection: distinguishing a simple multi-armed bandit problem from a linear contextual bandit problem.
We introduce new algorithms that explore in a data-adaptive manner and provide guarantees of the form $mathcalO(dalpha T1- alpha)$.
Our approach extends to model selection among nested linear contextual bandits under some additional assumptions.
arXiv Detail & Related papers (2021-11-08T18:05:35Z) - A Unifying Theory of Thompson Sampling for Continuous Risk-Averse
Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem.
Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals.
We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z) - Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave.
Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality.
We show that the standard optimistic algorithms are sub-optimal by dimension factors.
arXiv Detail & Related papers (2021-07-09T16:04:24Z) - Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z) - Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates.
The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature.
It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.