Dynamic Pricing and Learning with Bayesian Persuasion
- URL: http://arxiv.org/abs/2304.14385v2
- Date: Sun, 10 Dec 2023 22:39:36 GMT
- Title: Dynamic Pricing and Learning with Bayesian Persuasion
- Authors: Shipra Agrawal, Yiding Feng, Wei Tang
- Abstract summary: We consider a novel dynamic pricing and learning setting where in addition to setting prices of products, the seller also ex-ante commits to 'advertising schemes'
We use the popular Bayesian persuasion framework to model the effect of these signals on the buyers' valuation and purchase responses.
We design an online algorithm that can use past purchase responses to adaptively learn the optimal pricing and advertising strategy.
- Score: 18.59029578133633
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We consider a novel dynamic pricing and learning setting where in addition to
setting prices of products in sequential rounds, the seller also ex-ante
commits to 'advertising schemes'. That is, in the beginning of each round the
seller can decide what kind of signal they will provide to the buyer about the
product's quality upon realization. Using the popular Bayesian persuasion
framework to model the effect of these signals on the buyers' valuation and
purchase responses, we formulate the problem of finding an optimal design of
the advertising scheme along with a pricing scheme that maximizes the seller's
expected revenue. Without any apriori knowledge of the buyers' demand function,
our goal is to design an online algorithm that can use past purchase responses
to adaptively learn the optimal pricing and advertising strategy. We study the
regret of the algorithm when compared to the optimal clairvoyant price and
advertising scheme.
Our main result is a computationally efficient online algorithm that achieves
an $O(T^{2/3}(m\log T)^{1/3})$ regret bound when the valuation function is
linear in the product quality. Here $m$ is the cardinality of the discrete
product quality domain and $T$ is the time horizon. This result requires some
natural monotonicity and Lipschitz assumptions on the valuation function, but
no Lipschitz or smoothness assumption on the buyers' demand function. For
constant $m$, our result matches the regret lower bound for dynamic pricing
within logarithmic factors, which is a special case of our problem. We also
obtain several improved results for the widely considered special case of
additive valuations, including an $\tilde{O}(T^{2/3})$ regret bound independent
of $m$ when $m\le T^{1/3}$.
Related papers
- A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints [54.46126953873298]
We address the problem of dynamically pricing complementary items that are sequentially displayed to customers.
Coherent pricing policies for complementary items are essential because optimizing the pricing of each item individually is ineffective.
We empirically evaluate our approach using synthetic settings randomly generated from real-world data, and compare its performance in terms of constraints violation and regret.
arXiv Detail & Related papers (2024-07-08T09:55:31Z) - Improved Algorithms for Contextual Dynamic Pricing [24.530341596901476]
In contextual dynamic pricing, a seller sequentially prices goods based on contextual information.
Our algorithm achieves an optimal regret bound of $tildemathcalO(T2/3)$, improving the existing results.
For this model, our algorithm obtains a regret $tildemathcalO(Td+2beta/d+3beta)$, where $d$ is the dimension of the context space.
arXiv Detail & Related papers (2024-06-17T08:26:51Z) - Dynamic Pricing and Learning with Long-term Reference Effects [16.07344044662994]
We study a simple and novel reference price mechanism where reference price is the average of the past prices offered by the seller.
We show that under this mechanism, a markdown policy is near-optimal irrespective of the parameters of the model.
We then consider a more challenging dynamic pricing and learning problem, where the demand model parameters are apriori unknown.
arXiv Detail & Related papers (2024-02-19T21:36:54Z) - Pricing with Contextual Elasticity and Heteroscedastic Valuation [23.96777734246062]
We study an online contextual dynamic pricing problem, where customers decide whether to purchase a product based on its features and price.
We introduce a novel approach to modeling a customer's expected demand by incorporating feature-based price elasticity.
Our results shed light on the relationship between contextual elasticity and heteroscedastic valuation, providing insights for effective and practical pricing strategies.
arXiv Detail & Related papers (2023-12-26T11:07:37Z) - Contextual Dynamic Pricing with Strategic Buyers [93.97401997137564]
We study the contextual dynamic pricing problem with strategic buyers.
Seller does not observe the buyer's true feature, but a manipulated feature according to buyers' strategic behavior.
We propose a strategic dynamic pricing policy that incorporates the buyers' strategic behavior into the online learning to maximize the seller's cumulative revenue.
arXiv Detail & Related papers (2023-07-08T23:06:42Z) - Autoregressive Bandits [58.46584210388307]
We propose a novel online learning setting, Autoregressive Bandits, in which the observed reward is governed by an autoregressive process of order $k$.
We show that, under mild assumptions on the reward process, the optimal policy can be conveniently computed.
We then devise a new optimistic regret minimization algorithm, namely, AutoRegressive Upper Confidence Bound (AR-UCB), that suffers sublinear regret of order $widetildemathcalO left( frac(k+1)3/2sqrtnT (1-G
arXiv Detail & Related papers (2022-12-12T21:37:36Z) - Phase Transitions in Learning and Earning under Price Protection
Guarantee [4.683806391173103]
We study the impact of such policy on the design of online learning algorithm for data-driven dynamic pricing.
We show that the optimal regret is $tildeTheta(sqrtT+minM,,T2/3)$ by first establishing a fundamental impossible regime.
We propose LEAP, a phased exploration type algorithm for underlineLearning and underlineEArning under underlinePrice Protection.
arXiv Detail & Related papers (2022-11-03T13:36:00Z) - A Reinforcement Learning Approach in Multi-Phase Second-Price Auction
Design [158.0041488194202]
We study reserve price optimization in multi-phase second price auctions.
From the seller's perspective, we need to efficiently explore the environment in the presence of potentially nontruthful bidders.
Third, the seller's per-step revenue is unknown, nonlinear, and cannot even be directly observed from the environment.
arXiv Detail & Related papers (2022-10-19T03:49:05Z) - Towards Agnostic Feature-based Dynamic Pricing: Linear Policies vs
Linear Valuation with Unknown Noise [16.871660060209674]
We show an algorithm that achieves an $tildeO(Tfrac34)$ regret, and improve the best-known lower bound from $Omega(Tfrac35)$ to $tildeOmega(Tfrac23)$.
Results demonstrate that no-regret learning is possible for feature-based dynamic pricing under weak assumptions.
arXiv Detail & Related papers (2022-01-27T06:40:03Z) - Randomized Exploration for Reinforcement Learning with General Value
Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm.
Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.
We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z) - Online Learning with Imperfect Hints [72.4277628722419]
We develop algorithms and nearly matching lower bounds for online learning with imperfect directional hints.
Our algorithms are oblivious to the quality of the hints, and the regret bounds interpolate between the always-correlated hints case and the no-hints case.
arXiv Detail & Related papers (2020-02-11T23:06:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.