Related papers: MNL-Bandits under Inventory and Limited Switches Constraints

MNL-Bandits under Inventory and Limited Switches Constraints

URL: http://arxiv.org/abs/2204.10787v1
Date: Fri, 22 Apr 2022 16:02:27 GMT
Title: MNL-Bandits under Inventory and Limited Switches Constraints
Authors: Hongbin Zhang, Yu Yang, Feng Wu, Qixin Zhang
Abstract summary: We develop an efficient UCB-like algorithm to optimize the assortments while learning customers' choices from data. We prove that our algorithm can achieve a sub-linear regret bound $tildeOleft(T1-alpha/2right)$ if $O(Talpha)$ switches are allowed.
Score: 38.960764902819434
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Optimizing the assortment of products to display to customers is a key to increasing revenue for both offline and online retailers. To trade-off between exploring customers' preference and exploiting customers' choices learned from data, in this paper, by adopting the Multi-Nomial Logit (MNL) choice model to capture customers' choices over products, we study the problem of optimizing assortments over a planning horizon $T$ for maximizing the profit of the retailer. To make the problem setting more practical, we consider both the inventory constraint and the limited switches constraint, where the retailer cannot use up the resource inventory before time $T$ and is forbidden to switch the assortment shown to customers too many times. Such a setting suits the case when an online retailer wants to dynamically optimize the assortment selection for a population of customers. We develop an efficient UCB-like algorithm to optimize the assortments while learning customers' choices from data. We prove that our algorithm can achieve a sub-linear regret bound $\tilde{O}\left(T^{1-\alpha/2}\right)$ if $O(T^\alpha)$ switches are allowed. %, and our regret bound is optimal with respect to $T$. Extensive numerical experiments show that our algorithm outperforms baselines and the gap between our algorithm's performance and the theoretical upper bound is small.

Related papers

Procurement Auctions via Approximately Optimal Submodular Optimization [53.93943270902349]
We study procurement auctions, where an auctioneer seeks to acquire services from strategic sellers with private costs. Our goal is to design computationally efficient auctions that maximize the difference between the quality of the acquired services and the total cost of the sellers.
arXiv Detail & Related papers (2024-11-20T18:06:55Z)
Online Linear Programming with Batching [18.989352151219336]
We study Online Linear Programming (OLP) withOmega. We show that the ability to delay decisions brings better operational performance, as measured by regret. All the proposed algorithms delay decisions on customers arriving in only the first and the last batch.
arXiv Detail & Related papers (2024-08-01T06:13:24Z)
A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints [54.46126953873298]
We address the problem of dynamically pricing complementary items that are sequentially displayed to customers. Coherent pricing policies for complementary items are essential because optimizing the pricing of each item individually is ineffective. We empirically evaluate our approach using synthetic settings randomly generated from real-world data, and compare its performance in terms of constraints violation and regret.
arXiv Detail & Related papers (2024-07-08T09:55:31Z)
Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization [52.80408805368928]
We introduce a novel greedy-style subset selection algorithm for batch acquisition. Our experiments on the red fluorescent proteins show that our proposed method achieves the baseline performance in 1.69x fewer queries.
arXiv Detail & Related papers (2024-06-21T05:57:08Z)
Learning to Cover: Online Learning and Optimization with Irreversible Decisions [50.5775508521174]
We find that regret grows sub-linearly at a rate $Thetaleft(mfrac12cdotfrac11-2-Tright)$, thus converging exponentially fast to $Theta(sqrtm)$. These findings underscore the benefits of limited online learning and optimization, in that even a few rounds can provide significant benefits as compared to a no-learning baseline.
arXiv Detail & Related papers (2024-06-20T23:00:25Z)
Emulating Full Client Participation: A Long-Term Client Selection Strategy for Federated Learning [48.94952630292219]
We propose a novel client selection strategy designed to emulate the performance achieved with full client participation. In a single round, we select clients by minimizing the gradient-space estimation error between the client subset and the full client set. In multi-round selection, we introduce a novel individual fairness constraint, which ensures that clients with similar data distributions have similar frequencies of being selected.
arXiv Detail & Related papers (2024-05-22T12:27:24Z)
Online Joint Assortment-Inventory Optimization under MNL Choices [14.530542487845732]
We study an online joint assortment-inventory optimization problem, in which we assume that the choice behavior of each customer follows the Multinomial Logit (MNL) choice model. We propose a novel algorithm that can effectively balance the exploration and exploitation in the online decision-making of assortment and inventory.
arXiv Detail & Related papers (2023-04-04T09:25:34Z)
PASTA: Pessimistic Assortment Optimization [25.51792135903357]
We consider a class of assortment optimization problems in an offline data-driven setting. We propose an algorithm referred to as Pessimistic ASsortment opTimizAtion (PASTA) based on the principle of pessimism.
arXiv Detail & Related papers (2023-02-08T01:11:51Z)
Playing hide and seek: tackling in-store picking operations while improving customer experience [0.0]
We formalize a new problem called Dynamic In-store Picker Problem routing (diPRP) In this relevant problem - diPRP - a picker tries to pick online orders while minimizing customer encounters. Our work suggests that retailers should be able to scale the in-store picking of online orders without jeopardizing the experience of offline customers.
arXiv Detail & Related papers (2023-01-05T16:35:17Z)
Optimal Data Selection: An Online Distributed View [61.31708750038692]
We develop algorithms for the online and distributed version of the problem. We show that our selection methods outperform random selection by $5-20%$. In learning tasks on ImageNet and MNIST, we show that our selection methods outperform random selection by $5-20%$.
arXiv Detail & Related papers (2022-01-25T18:56:16Z)
A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit [2.9998316151418107]
We consider a dynamic set optimization problem, where a decision-maker offers a subset of products to a consumer. We model consumer choice behavior using the widely used Multinomial Logit (MNL) model. We show that the regret is bounded by $O(sqrtdT + kappa)$, significantly improving the performance over existing methods.
arXiv Detail & Related papers (2020-11-28T00:20:36Z)
Online Learning and Optimization for Revenue Management Problems with Add-on Discounts [14.844382070740524]
We formulate this problem as an optimization problem to determine the prices of different products and the selection of products with add-on discounts. We propose an efficient FPTAS algorithm that can solve the problem approximately to any desired accuracy. We show that our learning algorithm can converge to the optimal algorithm that has access to the true demand functions.
arXiv Detail & Related papers (2020-05-02T23:54:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.