Related papers: Categorized Bandits

Categorized Bandits

URL: http://arxiv.org/abs/2005.01656v1
Date: Mon, 4 May 2020 17:09:22 GMT
Title: Categorized Bandits
Authors: Matthieu Jedor, Jonathan Louedec, Vianney Perchet
Abstract summary: We introduce a new multi-armed bandit setting where arms are grouped inside ordered'' categories. The motivating example comes from e-commerce, where a customer typically has a greater appetence for items of a specific well-identified but unknown category than any other one.
Score: 17.865068872754293
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a new stochastic multi-armed bandit setting where arms are grouped inside ``ordered'' categories. The motivating example comes from e-commerce, where a customer typically has a greater appetence for items of a specific well-identified but unknown category than any other one. We introduce three concepts of ordering between categories, inspired by stochastic dominance between random variables, which are gradually weaker so that more and more bandit scenarios satisfy at least one of them. We first prove instance-dependent lower bounds on the cumulative regret for each of these models, indicating how the complexity of the bandit problems increases with the generality of the ordering concept considered. We also provide algorithms that fully leverage the structure of the model with their associated theoretical guarantees. Finally, we have conducted an analysis on real data to highlight that those ordered categories actually exist in practice.

Related papers

SelEx: Self-Expertise in Fine-Grained Generalized Category Discovery [55.72840638180451]
Generalized Category Discovery aims to simultaneously uncover novel categories and accurately classify known ones. Traditional methods, which lean heavily on self-supervision and contrastive learning, often fall short when distinguishing between fine-grained categories. We introduce a novel concept called self-expertise', which enhances the model's ability to recognize subtle differences and uncover unknown categories.
arXiv Detail & Related papers (2024-08-26T15:53:50Z)
Discrete Choice Multi-Armed Bandits [0.0]
This paper establishes a connection between a category of discrete choice models and the realms of online learning and multiarmed bandit algorithms. We furnish sublinear regret bounds for a comprehensive family of algorithms, encompassing the Exp3 algorithm as a particular case. We introduce a novel family of adversarial multiarmed bandit algorithms, drawing inspiration from the generalized nested logit models.
arXiv Detail & Related papers (2023-10-01T03:41:04Z)
Doubly High-Dimensional Contextual Bandits: An Interpretable Model for Joint Assortment-Pricing [24.80305303473745]
Key challenges in running a retail business include how to select products to present to consumers, and how to price products to maximize revenue or profit. We propose a joint approach to assortment-pricing based on contextual bandits. We show at least three-fold gains in revenue or profit by our bandit method, as well as the interpretability of the latent factor models that are learned.
arXiv Detail & Related papers (2023-09-14T00:45:36Z)
Uplifting Bandits [23.262188897812475]
We introduce a multi-armed bandit model where the reward is a sum of multiple random variables, and each action only alters the distributions of some of them. This model is motivated by marketing campaigns and recommender systems, where the variables represent outcomes on individual customers. We propose UCB-style algorithms that estimate the uplifts of the actions over a baseline.
arXiv Detail & Related papers (2022-06-08T18:00:56Z)
Deep Hierarchy in Bandits [51.22833900944146]
Mean rewards of actions are often correlated. To maximize statistical efficiency, it is important to leverage these correlations when learning. We formulate a bandit variant of this problem where the correlations of mean action rewards are represented by a hierarchical Bayesian model.
arXiv Detail & Related papers (2022-02-03T08:15:53Z)
Universal and data-adaptive algorithms for model selection in linear contextual bandits [52.47796554359261]
We consider the simplest non-trivial instance of model-selection: distinguishing a simple multi-armed bandit problem from a linear contextual bandit problem. We introduce new algorithms that explore in a data-adaptive manner and provide guarantees of the form $mathcalO(dalpha T1- alpha)$. Our approach extends to model selection among nested linear contextual bandits under some additional assumptions.
arXiv Detail & Related papers (2021-11-08T18:05:35Z)
Model Selection for Generic Contextual Bandits [20.207989166682832]
We propose a refinement based algorithm called Adaptive Contextual Bandit (ttfamily ACB) We prove that this algorithm is adaptive, i.e., the regret rate order-wise matches that of any provable contextual bandit algorithm. We also show that a much simpler explore-then-commit (ETC) style algorithm also obtains similar regret bound, despite not knowing the true model class.
arXiv Detail & Related papers (2021-07-07T19:35:31Z)
Learning over no-Preferred and Preferred Sequence of items for Robust Recommendation [66.8722561224499]
We propose a theoretically founded sequential strategy for training large-scale Recommender Systems (RS) over implicit feedback. We present two variants of this strategy where model parameters are updated using either the momentum method or a gradient-based approach.
arXiv Detail & Related papers (2020-12-12T22:10:15Z)
Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems [40.957688390621385]
Our framework captures complex statistical dependencies between actions, latent variables, and observations. We develop novel online learning algorithms that learn to act efficiently in our models.
arXiv Detail & Related papers (2020-07-09T16:25:40Z)
A Novel Confidence-Based Algorithm for Structured Bandits [129.30402124516507]
We study finite-armed bandits where the rewards of each arm might be correlated to those of other arms. We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the true bandit problem.
arXiv Detail & Related papers (2020-05-23T19:52:44Z)
Best Arm Identification for Cascading Bandits in the Fixed Confidence Setting [81.70513857417106]
We design and analyze CascadeBAI, an algorithm for finding the best set of $K$ items. An upper bound on the time complexity of CascadeBAI is derived by overcoming a crucial analytical challenge. We show, through the derivation of a lower bound on the time complexity, that the performance of CascadeBAI is optimal in some practical regimes.
arXiv Detail & Related papers (2020-01-23T16:47:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.