Related papers: Dynamic pricing and assortment under a contextual MNL demand

Dynamic pricing and assortment under a contextual MNL demand

URL: http://arxiv.org/abs/2110.10018v1
Date: Tue, 19 Oct 2021 14:37:10 GMT
Title: Dynamic pricing and assortment under a contextual MNL demand
Authors: Vineet Goyal and Noemie Perivier
Abstract summary: We consider dynamic multi-product pricing and assortment problems under an unknown demand over T periods. We propose a randomized dynamic pricing policy based on a variant of the Online Newton Step algorithm (ONS) We also present a new optimistic algorithm for the adversarial MNL contextual bandits problem.
Score: 2.1320960069210475
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider dynamic multi-product pricing and assortment problems under an unknown demand over T periods, where in each period, the seller decides on the price for each product or the assortment of products to offer to a customer who chooses according to an unknown Multinomial Logit Model (MNL). Such problems arise in many applications, including online retail and advertising. We propose a randomized dynamic pricing policy based on a variant of the Online Newton Step algorithm (ONS) that achieves a $O(d\sqrt{T}\log(T))$ regret guarantee under an adversarial arrival model. We also present a new optimistic algorithm for the adversarial MNL contextual bandits problem, which achieves a better dependency than the state-of-the-art algorithms in a problem-dependent constant $\kappa$ (potentially exponentially small). Our regret upper bounds scale as $\tilde{O}(d\sqrt{\kappa T}+ \log(T)/\kappa)$, which gives a significantly stronger bound than the existing $\tilde{O}(d\sqrt{T}/\kappa)$ guarantees.

Related papers

Enjoying Non-linearity in Multinomial Logistic Bandits [66.36004256710824]
We consider the multinomial logistic bandit problem, a variant of generalized linear bandits where a learner interacts with an environment.<n>We extend the definition of $kappa_*$ to the multinomial setting and propose an efficient algorithm that leverages the problem's non-linearity.<n>Our method yields a problem-dependent regret bound of order $ smashwidetildemathcalO( Kd sqrtT/kappa_*) $, where $K$ is the number of actions and $kappa_*
arXiv Detail & Related papers (2025-07-07T08:18:25Z)
Dynamic Assortment Selection and Pricing with Censored Preference Feedback [10.988222071035198]
We introduce a novel framework based on a textitcensored multinomial logit (C-MNL) choice model. Sellers present a set of products with prices, and buyers filter out products priced above their valuation, purchasing at most one product from the remaining options based on their preferences. Our algorithms achieve regret bounds of $tildeO(dfrac32sqrtT/kappa)$ and $tildeO(d2sqrtT/kappa)
arXiv Detail & Related papers (2025-04-03T06:56:08Z)
Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees [91.88803125231189]
Multi-step Preference Optimization (MPO) is built upon the natural actor-critic frameworkciteprakhlin2013online,joulani17a. We show that OMPO requires $mathcalO(epsilon-1)$ policy updates to converge to an $epsilon$-approximate Nash equilibrium. We also validate the effectiveness of our method on multi-turn conversations dataset and math reasoning dataset.
arXiv Detail & Related papers (2025-02-18T09:33:48Z)
A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints [54.46126953873298]
We address the problem of dynamically pricing complementary items that are sequentially displayed to customers. Coherent pricing policies for complementary items are essential because optimizing the pricing of each item individually is ineffective. We empirically evaluate our approach using synthetic settings randomly generated from real-world data, and compare its performance in terms of constraints violation and regret.
arXiv Detail & Related papers (2024-07-08T09:55:31Z)
Contextual Dynamic Pricing: Algorithms, Optimality, and Local Differential Privacy Constraints [10.057344315478709]
We study the contextual dynamic pricing problem where a firm sells products to $T$ sequentially arriving consumers. We first show that the optimal regret upper bound is of order $sqrtdT$, up to a logarithmic factor. A key insight of our theoretical result is an intrinsic connection between dynamic pricing and the contextual multi-armed bandit problem.
arXiv Detail & Related papers (2024-06-04T15:44:10Z)
The Real Price of Bandit Information in Multiclass Classification [73.17969992976501]
We revisit the classical problem of multiclass classification with bandit feedback. We present a new bandit classification algorithm that guarantees regret $smashwidetildeO(|H|+sqrtT)$.
arXiv Detail & Related papers (2024-05-16T12:11:09Z)
Pricing with Contextual Elasticity and Heteroscedastic Valuation [23.96777734246062]
We study an online contextual dynamic pricing problem, where customers decide whether to purchase a product based on its features and price. We introduce a novel approach to modeling a customer's expected demand by incorporating feature-based price elasticity. Our results shed light on the relationship between contextual elasticity and heteroscedastic valuation, providing insights for effective and practical pricing strategies.
arXiv Detail & Related papers (2023-12-26T11:07:37Z)
Autoregressive Bandits [58.46584210388307]
We propose a novel online learning setting, Autoregressive Bandits, in which the observed reward is governed by an autoregressive process of order $k$. We show that, under mild assumptions on the reward process, the optimal policy can be conveniently computed. We then devise a new optimistic regret minimization algorithm, namely, AutoRegressive Upper Confidence Bound (AR-UCB), that suffers sublinear regret of order $widetildemathcalO left( frac(k+1)3/2sqrtnT (1-G
arXiv Detail & Related papers (2022-12-12T21:37:36Z)
A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design [158.0041488194202]
We study reserve price optimization in multi-phase second price auctions. From the seller's perspective, we need to efficiently explore the environment in the presence of potentially nontruthful bidders. Third, the seller's per-step revenue is unknown, nonlinear, and cannot even be directly observed from the environment.
arXiv Detail & Related papers (2022-10-19T03:49:05Z)
Horizon-Free Reinforcement Learning in Polynomial Time: the Power of Stationary Policies [88.75843804630772]
We design an algorithm that achieves an $Oleft(mathrmpoly(S,A,log K)sqrtKright)$ regret in contrast to existing bounds. Our result relies on a sequence of new structural lemmas establishing the approximation power, stability, and concentration property of stationary policies.
arXiv Detail & Related papers (2022-03-24T08:14:12Z)
MNL-Bandit with Knapsacks: a near-optimal algorithm [2.3020018305241337]
We consider a dynamic assortment selection problem where a seller has a fixed inventory of $N$ substitutable products. In each period, the seller needs to decide on the assortment of products to offer to the customers. We show that when the inventory size grows quasi-linearly in time, MNLwK-UCB achieves a $tildeO(N + sqrtNT)$ regret bound.
arXiv Detail & Related papers (2021-06-02T13:05:34Z)
Minimax Regret for Stochastic Shortest Path [63.45407095296692]
We study the Shortest Path (SSP) problem in which an agent has to reach a goal state in minimum total expected cost. We show that the minimax regret for this setting is $widetilde O(B_star sqrt|S| |A| K)$ where $B_star$ is a bound on the expected cost of the optimal policy from any state. Our algorithm runs in-time per episode, and is based on a novel reduction to reinforcement learning in finite-horizon MDPs.
arXiv Detail & Related papers (2021-03-24T10:11:49Z)
Logarithmic Regret in Feature-based Dynamic Pricing [0.0]
Feature-based dynamic pricing is an increasingly popular model of setting prices for highly differentiated products. We provide two algorithms for infracigen and adversarial feature settings, and prove the optimal $O(dlogT)$ regret bounds for both. We also prove an $(sqrtT)$ information-theoretic lower bound for a slightly more general setting, which demonstrates that "knowing-the-demand-curve" leads to an exponential improvement in feature-based dynamic pricing.
arXiv Detail & Related papers (2021-02-20T00:45:33Z)
A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit [2.9998316151418107]
We consider a dynamic set optimization problem, where a decision-maker offers a subset of products to a consumer. We model consumer choice behavior using the widely used Multinomial Logit (MNL) model. We show that the regret is bounded by $O(sqrtdT + kappa)$, significantly improving the performance over existing methods.
arXiv Detail & Related papers (2020-11-28T00:20:36Z)
Fully Gap-Dependent Bounds for Multinomial Logit Bandit [5.132017939561661]
We study the multinomial logit (MNL) bandit problem, where at each time step, the seller offers an assortment of size at most $K$ from a pool of $N$ items. We present (i) an algorithm that identifies the optimal assortment $S*$ within $widetildeO(sum_i = 1N Delta_i-2)$ time steps with high probability, and (ii) an algorithm that incurs $O(sum_i notin S* KDelta_i
arXiv Detail & Related papers (2020-11-19T17:52:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.