Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model
- URL: http://arxiv.org/abs/2303.15652v2
- Date: Sat, 14 Oct 2023 00:53:41 GMT
- Title: Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model
- Authors: Rashmi Ranjan Bhuyan, Adel Javanmard, Sungchul Kim, Gourab Mukherjee,
Ryan A. Rossi, Tong Yu, Handong Zhao
- Abstract summary: We consider a dynamic model with the consumers' preferences as well as price sensitivity varying over time.
We measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance.
Our regret analysis results not only demonstrate optimality of the proposed policy but also show that for policy planning it is essential to incorporate available structural information.
- Score: 50.06663781566795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider dynamic pricing strategies in a streamed longitudinal data set-up
where the objective is to maximize, over time, the cumulative profit across a
large number of customer segments. We consider a dynamic model with the
consumers' preferences as well as price sensitivity varying over time. Building
on the well-known finding that consumers sharing similar characteristics act in
similar ways, we consider a global shrinkage structure, which assumes that the
consumers' preferences across the different segments can be well approximated
by a spatial autoregressive (SAR) model. In such a streamed longitudinal
set-up, we measure the performance of a dynamic pricing policy via regret,
which is the expected revenue loss compared to a clairvoyant that knows the
sequence of model parameters in advance. We propose a pricing policy based on
penalized stochastic gradient descent (PSGD) and explicitly characterize its
regret as functions of time, the temporal variability in the model parameters
as well as the strength of the auto-correlation network structure spanning the
varied customer segments. Our regret analysis results not only demonstrate
asymptotic optimality of the proposed policy but also show that for policy
planning it is essential to incorporate available structural information as
policies based on unshrunken models are highly sub-optimal in the
aforementioned set-up. We conduct simulation experiments across a wide range of
regimes as well as real-world networks based studies and report encouraging
performance for our proposed method.
Related papers
- A Tale of Two Cities: Pessimism and Opportunism in Offline Dynamic Pricing [20.06425698412548]
This paper studies offline dynamic pricing without data coverage assumption.
We establish a partial identification bound for the demand parameter whose associated price is unobserved.
We incorporate pessimistic and opportunistic strategies within the proposed partial identification framework to derive the estimated policy.
arXiv Detail & Related papers (2024-11-12T19:09:41Z) - COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically
for Model-Based RL [50.385005413810084]
Dyna-style model-based reinforcement learning contains two phases: model rollouts to generate sample for policy learning and real environment exploration.
$textttCOPlanner$ is a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem.
arXiv Detail & Related papers (2023-10-11T06:10:07Z) - Model-based Causal Bayesian Optimization [74.78486244786083]
We introduce the first algorithm for Causal Bayesian Optimization with Multiplicative Weights (CBO-MW)
We derive regret bounds for CBO-MW that naturally depend on graph-related quantities.
Our experiments include a realistic demonstration of how CBO-MW can be used to learn users' demand patterns in a shared mobility system.
arXiv Detail & Related papers (2023-07-31T13:02:36Z) - Choice Models and Permutation Invariance: Demand Estimation in
Differentiated Products Markets [5.8429701619765755]
We demonstrate how non-parametric estimators like neural nets can easily approximate choice functions.
Our proposed functionals can flexibly capture underlying consumer behavior in a completely data-driven fashion.
Our empirical analysis confirms that the estimator generates realistic and comparable own- and cross-price elasticities.
arXiv Detail & Related papers (2023-07-13T23:24:05Z) - Dual policy as self-model for planning [71.73710074424511]
We refer to the model used to simulate one's decisions as the agent's self-model.
Inspired by current reinforcement learning approaches and neuroscience, we explore the benefits and limitations of using a distilled policy network as the self-model.
arXiv Detail & Related papers (2023-06-07T13:58:45Z) - Personalized Pricing with Invalid Instrumental Variables:
Identification, Estimation, and Policy Learning [5.372349090093469]
This work studies offline personalized pricing under endogeneity using an instrumental variable approach.
We propose a new policy learning method for Personalized pRicing using Invalid iNsTrumental variables.
arXiv Detail & Related papers (2023-02-24T14:50:47Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - On the estimation of discrete choice models to capture irrational
customer behaviors [4.683806391173103]
We show how to use partially-ranked preferences to efficiently model rational and irrational customer types from transaction data.
An extensive set of experiments assesses the predictive accuracy of the proposed approach.
arXiv Detail & Related papers (2021-09-08T19:19:51Z) - Autoregressive Dynamics Models for Offline Policy Evaluation and
Optimization [60.73540999409032]
We show that expressive autoregressive dynamics models generate different dimensions of the next state and reward sequentially conditioned on previous dimensions.
We also show that autoregressive dynamics models are useful for offline policy optimization by serving as a way to enrich the replay buffer.
arXiv Detail & Related papers (2021-04-28T16:48:44Z) - Self-adapting Robustness in Demand Learning [1.949912057689623]
We study dynamic pricing over a finite number of periods in the presence of demand model ambiguity.
We develop an adaptively-robust-learning (ARL) pricing policy that learns the true model parameters from the data.
We characterize the behavior of ARL's self-adapting ambiguity sets and derive a regret bound that highlights the link between the scale of revenue loss and the customer arrival pattern.
arXiv Detail & Related papers (2020-11-21T01:15:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.