Fully Gap-Dependent Bounds for Multinomial Logit Bandit
- URL: http://arxiv.org/abs/2011.09998v1
- Date: Thu, 19 Nov 2020 17:52:12 GMT
- Title: Fully Gap-Dependent Bounds for Multinomial Logit Bandit
- Authors: Jiaqi Yang
- Abstract summary: We study the multinomial logit (MNL) bandit problem, where at each time step, the seller offers an assortment of size at most $K$ from a pool of $N$ items.
We present (i) an algorithm that identifies the optimal assortment $S*$ within $widetildeO(sum_i = 1N Delta_i-2)$ time steps with high probability, and (ii) an algorithm that incurs $O(sum_i notin S* KDelta_i
- Score: 5.132017939561661
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the multinomial logit (MNL) bandit problem, where at each time step,
the seller offers an assortment of size at most $K$ from a pool of $N$ items,
and the buyer purchases an item from the assortment according to a MNL choice
model. The objective is to learn the model parameters and maximize the expected
revenue. We present (i) an algorithm that identifies the optimal assortment
$S^*$ within $\widetilde{O}(\sum_{i = 1}^N \Delta_i^{-2})$ time steps with high
probability, and (ii) an algorithm that incurs $O(\sum_{i \notin S^*}
K\Delta_i^{-1} \log T)$ regret in $T$ time steps. To our knowledge, our
algorithms are the first to achieve gap-dependent bounds that fully depends on
the suboptimality gaps of all items. Our technical contributions include an
algorithmic framework that relates the MNL-bandit problem to a variant of the
top-$K$ arm identification problem in multi-armed bandits, a generalized
epoch-based offering procedure, and a layer-based adaptive estimation
procedure.
Related papers
- Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Multinomial Logit Contextual Bandits: Provable Optimality and
Practicality [15.533842336139063]
We consider a sequential assortment selection problem where the user choice is given by a multinomial logit (MNL) choice model whose parameters are unknown.
We propose upper confidence bound based algorithms for this MNL contextual bandit.
We show that a simple variant of the algorithm achieves the optimal regret for a broad class of important applications.
arXiv Detail & Related papers (2021-03-25T15:42:25Z) - Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
We study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous.
We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy.
We show that our algorithm has a regret guarantee of $O(ksqrt(A-k+1)T log (|mathcalF|T))$.
arXiv Detail & Related papers (2021-02-15T19:10:52Z) - A Multi-Arm Bandit Approach To Subset Selection Under Constraints [14.543433698096667]
We explore the class of problems where a central planner needs to select a subset of agents, each with different quality and cost.
When the agents' quality is known, we formulate our problem as an integer linear program (ILP)
We propose a deterministic algorithm, namely dpss, that provides an exact solution to our ILP.
arXiv Detail & Related papers (2021-02-09T13:48:57Z) - Practical and Parallelizable Algorithms for Non-Monotone Submodular
Maximization with Size Constraint [20.104148319012854]
We present and parallelizable for a submodular function, not necessarily a monotone, with respect to a size constraint.
We improve the best approximation factor achieved by an algorithm that has optimal adaptivity and nearly optimal complexity query to $0.193 - varepsilon$.
arXiv Detail & Related papers (2020-09-03T22:43:55Z) - Revisiting Modified Greedy Algorithm for Monotone Submodular
Maximization with a Knapsack Constraint [75.85952446237599]
We show that a modified greedy algorithm can achieve an approximation factor of $0.305$.
We derive a data-dependent upper bound on the optimum.
It can also be used to significantly improve the efficiency of such algorithms as branch and bound.
arXiv Detail & Related papers (2020-08-12T15:40:21Z) - Optimal $\delta$-Correct Best-Arm Selection for Heavy-Tailed
Distributions [2.2940141855172036]
We consider the problem of identifying the one with the maximum mean using a $delta$-correct algorithm.
Lower bounds for $delta$-correct algorithms are well known.
We propose a $delta$-correct algorithm that matches the lower bound as $delta$ reduces to zero.
arXiv Detail & Related papers (2019-08-24T05:31:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.