MNL-Bandits under Inventory and Limited Switches Constraints
- URL: http://arxiv.org/abs/2204.10787v1
- Date: Fri, 22 Apr 2022 16:02:27 GMT
- Title: MNL-Bandits under Inventory and Limited Switches Constraints
- Authors: Hongbin Zhang, Yu Yang, Feng Wu, Qixin Zhang
- Abstract summary: We develop an efficient UCB-like algorithm to optimize the assortments while learning customers' choices from data.
We prove that our algorithm can achieve a sub-linear regret bound $tildeOleft(T1-alpha/2right)$ if $O(Talpha)$ switches are allowed.
- Score: 38.960764902819434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optimizing the assortment of products to display to customers is a key to
increasing revenue for both offline and online retailers. To trade-off between
exploring customers' preference and exploiting customers' choices learned from
data, in this paper, by adopting the Multi-Nomial Logit (MNL) choice model to
capture customers' choices over products, we study the problem of optimizing
assortments over a planning horizon $T$ for maximizing the profit of the
retailer. To make the problem setting more practical, we consider both the
inventory constraint and the limited switches constraint, where the retailer
cannot use up the resource inventory before time $T$ and is forbidden to switch
the assortment shown to customers too many times. Such a setting suits the case
when an online retailer wants to dynamically optimize the assortment selection
for a population of customers. We develop an efficient UCB-like algorithm to
optimize the assortments while learning customers' choices from data. We prove
that our algorithm can achieve a sub-linear regret bound
$\tilde{O}\left(T^{1-\alpha/2}\right)$ if $O(T^\alpha)$ switches are allowed.
%, and our regret bound is optimal with respect to $T$. Extensive numerical
experiments show that our algorithm outperforms baselines and the gap between
our algorithm's performance and the theoretical upper bound is small.
Related papers
- A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints [54.46126953873298]
We address the problem of dynamically pricing complementary items that are sequentially displayed to customers.
Coherent pricing policies for complementary items are essential because optimizing the pricing of each item individually is ineffective.
We empirically evaluate our approach using synthetic settings randomly generated from real-world data, and compare its performance in terms of constraints violation and regret.
arXiv Detail & Related papers (2024-07-08T09:55:31Z) - Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization [52.80408805368928]
We introduce a novel greedy-style subset selection algorithm for batch acquisition.
Our experiments on the red fluorescent proteins show that our proposed method achieves the baseline performance in 1.69x fewer queries.
arXiv Detail & Related papers (2024-06-21T05:57:08Z) - Learning to Cover: Online Learning and Optimization with Irreversible Decisions [50.5775508521174]
We find that regret grows sub-linearly at a rate $Thetaleft(mfrac12cdotfrac11-2-Tright)$, thus converging exponentially fast to $Theta(sqrtm)$.
These findings underscore the benefits of limited online learning and optimization, in that even a few rounds can provide significant benefits as compared to a no-learning baseline.
arXiv Detail & Related papers (2024-06-20T23:00:25Z) - Emulating Full Client Participation: A Long-Term Client Selection Strategy for Federated Learning [48.94952630292219]
We propose a novel client selection strategy designed to emulate the performance achieved with full client participation.
In a single round, we select clients by minimizing the gradient-space estimation error between the client subset and the full client set.
In multi-round selection, we introduce a novel individual fairness constraint, which ensures that clients with similar data distributions have similar frequencies of being selected.
arXiv Detail & Related papers (2024-05-22T12:27:24Z) - Online Joint Assortment-Inventory Optimization under MNL Choices [14.530542487845732]
We study an online joint assortment-inventory optimization problem, in which we assume that the choice behavior of each customer follows the Multinomial Logit (MNL) choice model.
We propose a novel algorithm that can effectively balance the exploration and exploitation in the online decision-making of assortment and inventory.
arXiv Detail & Related papers (2023-04-04T09:25:34Z) - PASTA: Pessimistic Assortment Optimization [25.51792135903357]
We consider a class of assortment optimization problems in an offline data-driven setting.
We propose an algorithm referred to as Pessimistic ASsortment opTimizAtion (PASTA) based on the principle of pessimism.
arXiv Detail & Related papers (2023-02-08T01:11:51Z) - Playing hide and seek: tackling in-store picking operations while
improving customer experience [0.0]
We formalize a new problem called Dynamic In-store Picker Problem routing (diPRP)
In this relevant problem - diPRP - a picker tries to pick online orders while minimizing customer encounters.
Our work suggests that retailers should be able to scale the in-store picking of online orders without jeopardizing the experience of offline customers.
arXiv Detail & Related papers (2023-01-05T16:35:17Z) - Optimal Data Selection: An Online Distributed View [61.31708750038692]
We develop algorithms for the online and distributed version of the problem.
We show that our selection methods outperform random selection by $5-20%$.
In learning tasks on ImageNet and MNIST, we show that our selection methods outperform random selection by $5-20%$.
arXiv Detail & Related papers (2022-01-25T18:56:16Z) - A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit [2.9998316151418107]
We consider a dynamic set optimization problem, where a decision-maker offers a subset of products to a consumer.
We model consumer choice behavior using the widely used Multinomial Logit (MNL) model.
We show that the regret is bounded by $O(sqrtdT + kappa)$, significantly improving the performance over existing methods.
arXiv Detail & Related papers (2020-11-28T00:20:36Z) - Online Learning and Optimization for Revenue Management Problems with
Add-on Discounts [14.844382070740524]
We formulate this problem as an optimization problem to determine the prices of different products and the selection of products with add-on discounts.
We propose an efficient FPTAS algorithm that can solve the problem approximately to any desired accuracy.
We show that our learning algorithm can converge to the optimal algorithm that has access to the true demand functions.
arXiv Detail & Related papers (2020-05-02T23:54:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.