Related papers: Online Assortment and Price Optimization Under Contextual Choice Models

Online Assortment and Price Optimization Under Contextual Choice Models

URL: http://arxiv.org/abs/2503.11819v1
Date: Fri, 14 Mar 2025 19:15:33 GMT
Title: Online Assortment and Price Optimization Under Contextual Choice Models
Authors: Yigit Efe Erginbas, Thomas A. Courtade, Kannan Ramchandran,
Abstract summary: We consider an assortment selection and pricing problem in which a seller has $N$ different items available for sale.<n>In each round, the seller observes a $d$-dimensional contextual preference information vector for the user, and offers to the user an assortment of $K$ items at prices chosen by the seller.<n>The user selects at most one of the products from the offered assortment according to a multinomial logit choice model whose parameters are unknown.<n>We propose an algorithm that learns from user feedback and achieves a revenue regret of order $widetildeO(d sqrtK
Score: 13.578723345690582
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider an assortment selection and pricing problem in which a seller has $N$ different items available for sale. In each round, the seller observes a $d$-dimensional contextual preference information vector for the user, and offers to the user an assortment of $K$ items at prices chosen by the seller. The user selects at most one of the products from the offered assortment according to a multinomial logit choice model whose parameters are unknown. The seller observes which, if any, item is chosen at the end of each round, with the goal of maximizing cumulative revenue over a selling horizon of length $T$. For this problem, we propose an algorithm that learns from user feedback and achieves a revenue regret of order $\widetilde{O}(d \sqrt{K T} / L_0 )$ where $L_0$ is the minimum price sensitivity parameter. We also obtain a lower bound of order $\Omega(d \sqrt{T}/ L_0)$ for the regret achievable by any algorithm.

Related papers

Dynamic Assortment Selection and Pricing with Censored Preference Feedback [10.988222071035198]
We introduce a novel framework based on a textitcensored multinomial logit (C-MNL) choice model. Sellers present a set of products with prices, and buyers filter out products priced above their valuation, purchasing at most one product from the remaining options based on their preferences. Our algorithms achieve regret bounds of $tildeO(dfrac32sqrtT/kappa)$ and $tildeO(d2sqrtT/kappa)
arXiv Detail & Related papers (2025-04-03T06:56:08Z)
A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints [54.46126953873298]
We address the problem of dynamically pricing complementary items that are sequentially displayed to customers. Coherent pricing policies for complementary items are essential because optimizing the pricing of each item individually is ineffective. We empirically evaluate our approach using synthetic settings randomly generated from real-world data, and compare its performance in terms of constraints violation and regret.
arXiv Detail & Related papers (2024-07-08T09:55:31Z)
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)<n>We study a model within this domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.<n>We propose an algorithm namely robust contextual dueling bandits (RCDB), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z)
No-Regret Algorithms in non-Truthful Auctions with Budget and ROI Constraints [0.9694940903078658]
We study the problem of designing online autobidding algorithms to optimize value subject to ROI and budget constraints. Our main result is an algorithm with full information feedback that guarantees a near-optimal $tilde O(sqrt T)$ regret with respect to the best Lipschitz function.
arXiv Detail & Related papers (2024-04-15T14:31:53Z)
Dynamic Pricing and Advertising with Demand Learning [16.54088382906195]
We consider a novel pricing and advertising framework, where a seller not only sets product price but also designs flexible'advertising schemes'<n>We impose no structural restriction on the seller's feasible advertising strategies and allow her to advertise the product by disclosing or concealing any information.<n>Customers observe the advertising signal and infer a Bayesian belief over the products.
arXiv Detail & Related papers (2023-04-27T17:52:06Z)
Price DOES Matter! Modeling Price and Interest Preferences in Session-based Recommendation [55.0391061198924]
Session-based recommendation aims to predict items that an anonymous user would like to purchase based on her short behavior sequence. It is nontrivial to incorporate price preferences for session-based recommendation. We propose a novel method Co-guided Heterogeneous Hypergraph Network (CoHHN) for session-based recommendation.
arXiv Detail & Related papers (2022-05-09T10:47:15Z)
No-Regret Learning in Partially-Informed Auctions [85.67897346422122]
We study a machine learning formulation of auctions with partially-revealed information. In each round, a new item is drawn from an unknown distribution and the platform publishes a price together with incomplete, "masked" information about the item. We show that when the distribution over items is known to the buyer and the mask is a SimHash function mapping $mathbbRd$ to $0,1ell$, our algorithm has regret $tilde mathcalO((Tdell)frac12)$.
arXiv Detail & Related papers (2022-02-22T01:15:51Z)
MNL-Bandit with Knapsacks: a near-optimal algorithm [2.3020018305241337]
We consider a dynamic assortment selection problem where a seller has a fixed inventory of $N$ substitutable products. In each period, the seller needs to decide on the assortment of products to offer to the customers. We show that when the inventory size grows quasi-linearly in time, MNLwK-UCB achieves a $tildeO(N + sqrtNT)$ regret bound.
arXiv Detail & Related papers (2021-06-02T13:05:34Z)
Multinomial Logit Contextual Bandits: Provable Optimality and Practicality [15.533842336139063]
We consider a sequential assortment selection problem where the user choice is given by a multinomial logit (MNL) choice model whose parameters are unknown. We propose upper confidence bound based algorithms for this MNL contextual bandit. We show that a simple variant of the algorithm achieves the optimal regret for a broad class of important applications.
arXiv Detail & Related papers (2021-03-25T15:42:25Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z)
Fully Gap-Dependent Bounds for Multinomial Logit Bandit [5.132017939561661]
We study the multinomial logit (MNL) bandit problem, where at each time step, the seller offers an assortment of size at most $K$ from a pool of $N$ items. We present (i) an algorithm that identifies the optimal assortment $S*$ within $widetildeO(sum_i = 1N Delta_i-2)$ time steps with high probability, and (ii) an algorithm that incurs $O(sum_i notin S* KDelta_i
arXiv Detail & Related papers (2020-11-19T17:52:12Z)
Regret in Online Recommendation Systems [73.58127515175127]
This paper proposes a theoretical analysis of recommendation systems in an online setting, where items are sequentially recommended to users over time. In each round, a user, randomly picked from a population of $m$ users, requests a recommendation. The decision-maker observes the user and selects an item from a catalogue of $n$ items. The performance of the recommendation algorithm is captured through its regret, considering as a reference an Oracle algorithm aware of these probabilities.
arXiv Detail & Related papers (2020-10-23T12:48:35Z)
Learning to Rank under Multinomial Logit Choice [6.929312022493406]
Learning the optimal ordering of content is an important challenge in website design. We present theoretical analysis leading to an $Omega(sqrtJT)$ lower bound for the problem, and an $tildeO(sqrtJT)$ upper bound on regret of the UCB algorithm.
arXiv Detail & Related papers (2020-09-07T16:15:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.