Related papers: Dynamic Assortment Selection and Pricing with Censored Preference Feedback

Dynamic Assortment Selection and Pricing with Censored Preference Feedback

URL: http://arxiv.org/abs/2504.02324v1
Date: Thu, 03 Apr 2025 06:56:08 GMT
Title: Dynamic Assortment Selection and Pricing with Censored Preference Feedback
Authors: Jung-hun Kim, Min-hwan Oh,
Abstract summary: We introduce a novel framework based on a textitcensored multinomial logit (C-MNL) choice model.<n>Sellers present a set of products with prices, and buyers filter out products priced above their valuation, purchasing at most one product from the remaining options based on their preferences.<n>Our algorithms achieve regret bounds of $tildeO(dfrac32sqrtT/kappa)$ and $tildeO(d2sqrtT/kappa)
Score: 10.988222071035198
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this study, we investigate the problem of dynamic multi-product selection and pricing by introducing a novel framework based on a \textit{censored multinomial logit} (C-MNL) choice model. In this model, sellers present a set of products with prices, and buyers filter out products priced above their valuation, purchasing at most one product from the remaining options based on their preferences. The goal is to maximize seller revenue by dynamically adjusting product offerings and prices, while learning both product valuations and buyer preferences through purchase feedback. To achieve this, we propose a Lower Confidence Bound (LCB) pricing strategy. By combining this pricing strategy with either an Upper Confidence Bound (UCB) or Thompson Sampling (TS) product selection approach, our algorithms achieve regret bounds of $\tilde{O}(d^{\frac{3}{2}}\sqrt{T/\kappa})$ and $\tilde{O}(d^{2}\sqrt{T/\kappa})$, respectively. Finally, we validate the performance of our methods through simulations, demonstrating their effectiveness.

Related papers

Online Assortment and Price Optimization Under Contextual Choice Models [13.578723345690582]
We consider an assortment selection and pricing problem in which a seller has $N$ different items available for sale. In each round, the seller observes a $d$-dimensional contextual preference information vector for the user, and offers to the user an assortment of $K$ items at prices chosen by the seller. The user selects at most one of the products from the offered assortment according to a multinomial logit choice model whose parameters are unknown. We propose an algorithm that learns from user feedback and achieves a revenue regret of order $widetildeO(d sqrtK
arXiv Detail & Related papers (2025-03-14T19:15:33Z)
Improved Algorithms for Contextual Dynamic Pricing [24.530341596901476]
In contextual dynamic pricing, a seller sequentially prices goods based on contextual information.<n>Our algorithm achieves an optimal regret bound of $tildemathcalO(T2/3)$, improving the existing results.<n>For this model, our algorithm obtains a regret $tildemathcalO(Td+2beta/d+3beta)$, where $d$ is the dimension of the context space.
arXiv Detail & Related papers (2024-06-17T08:26:51Z)
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)<n>We study a model within this domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.<n>We propose an algorithm namely robust contextual dueling bandits (RCDB), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z)
Dynamic Pricing and Learning with Long-term Reference Effects [16.07344044662994]
We study a simple and novel reference price mechanism where reference price is the average of the past prices offered by the seller. We show that under this mechanism, a markdown policy is near-optimal irrespective of the parameters of the model. We then consider a more challenging dynamic pricing and learning problem, where the demand model parameters are apriori unknown.
arXiv Detail & Related papers (2024-02-19T21:36:54Z)
Pricing with Contextual Elasticity and Heteroscedastic Valuation [23.96777734246062]
We study an online contextual dynamic pricing problem, where customers decide whether to purchase a product based on its features and price. We introduce a novel approach to modeling a customer's expected demand by incorporating feature-based price elasticity. Our results shed light on the relationship between contextual elasticity and heteroscedastic valuation, providing insights for effective and practical pricing strategies.
arXiv Detail & Related papers (2023-12-26T11:07:37Z)
Contextual Dynamic Pricing with Strategic Buyers [93.97401997137564]
We study the contextual dynamic pricing problem with strategic buyers. Seller does not observe the buyer's true feature, but a manipulated feature according to buyers' strategic behavior. We propose a strategic dynamic pricing policy that incorporates the buyers' strategic behavior into the online learning to maximize the seller's cumulative revenue.
arXiv Detail & Related papers (2023-07-08T23:06:42Z)
Dynamic Pricing and Advertising with Demand Learning [16.54088382906195]
We consider a novel pricing and advertising framework, where a seller not only sets product price but also designs flexible'advertising schemes'<n>We impose no structural restriction on the seller's feasible advertising strategies and allow her to advertise the product by disclosing or concealing any information.<n>Customers observe the advertising signal and infer a Bayesian belief over the products.
arXiv Detail & Related papers (2023-04-27T17:52:06Z)
Autoregressive Bandits [58.46584210388307]
We propose a novel online learning setting, Autoregressive Bandits, in which the observed reward is governed by an autoregressive process of order $k$. We show that, under mild assumptions on the reward process, the optimal policy can be conveniently computed. We then devise a new optimistic regret minimization algorithm, namely, AutoRegressive Upper Confidence Bound (AR-UCB), that suffers sublinear regret of order $widetildemathcalO left( frac(k+1)3/2sqrtnT (1-G
arXiv Detail & Related papers (2022-12-12T21:37:36Z)
A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design [158.0041488194202]
We study reserve price optimization in multi-phase second price auctions. From the seller's perspective, we need to efficiently explore the environment in the presence of potentially nontruthful bidders. Third, the seller's per-step revenue is unknown, nonlinear, and cannot even be directly observed from the environment.
arXiv Detail & Related papers (2022-10-19T03:49:05Z)
Price DOES Matter! Modeling Price and Interest Preferences in Session-based Recommendation [55.0391061198924]
Session-based recommendation aims to predict items that an anonymous user would like to purchase based on her short behavior sequence. It is nontrivial to incorporate price preferences for session-based recommendation. We propose a novel method Co-guided Heterogeneous Hypergraph Network (CoHHN) for session-based recommendation.
arXiv Detail & Related papers (2022-05-09T10:47:15Z)
Dynamic pricing and assortment under a contextual MNL demand [2.1320960069210475]
We consider dynamic multi-product pricing and assortment problems under an unknown demand over T periods. We propose a randomized dynamic pricing policy based on a variant of the Online Newton Step algorithm (ONS) We also present a new optimistic algorithm for the adversarial MNL contextual bandits problem.
arXiv Detail & Related papers (2021-10-19T14:37:10Z)
Optimistic Policy Optimization with Bandit Feedback [70.75568142146493]
We propose an optimistic trust region policy optimization (TRPO) algorithm for which we establish $tilde O(sqrtS2 A H4 K)$ regret for previous rewards. To the best of our knowledge, the two results are the first sub-linear regret bounds obtained for policy optimization algorithms with unknown transitions and bandit feedback.
arXiv Detail & Related papers (2020-02-19T15:41:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.