A Tale of Two Cities: Pessimism and Opportunism in Offline Dynamic Pricing
- URL: http://arxiv.org/abs/2411.08126v1
- Date: Tue, 12 Nov 2024 19:09:41 GMT
- Title: A Tale of Two Cities: Pessimism and Opportunism in Offline Dynamic Pricing
- Authors: Zeyu Bian, Zhengling Qi, Cong Shi, Lan Wang,
- Abstract summary: This paper studies offline dynamic pricing without data coverage assumption.
We establish a partial identification bound for the demand parameter whose associated price is unobserved.
We incorporate pessimistic and opportunistic strategies within the proposed partial identification framework to derive the estimated policy.
- Score: 20.06425698412548
- License:
- Abstract: This paper studies offline dynamic pricing without data coverage assumption, thereby allowing for any price including the optimal one not being observed in the offline data. Previous approaches that rely on the various coverage assumptions such as that the optimal prices are observable, would lead to suboptimal decisions and consequently, reduced profits. We address this challenge by framing the problem to a partial identification framework. Specifically, we establish a partial identification bound for the demand parameter whose associated price is unobserved by leveraging the inherent monotonicity property in the pricing problem. We further incorporate pessimistic and opportunistic strategies within the proposed partial identification framework to derive the estimated policy. Theoretically, we establish rate-optimal finite-sample regret guarantees for both strategies. Empirically, we demonstrate the superior performance of the newly proposed methods via a synthetic environment. This research provides practitioners with valuable insights into offline pricing strategies in the challenging no-coverage setting, ultimately fostering sustainable growth and profitability of the company.
Related papers
- A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints [54.46126953873298]
We address the problem of dynamically pricing complementary items that are sequentially displayed to customers.
Coherent pricing policies for complementary items are essential because optimizing the pricing of each item individually is ineffective.
We empirically evaluate our approach using synthetic settings randomly generated from real-world data, and compare its performance in terms of constraints violation and regret.
arXiv Detail & Related papers (2024-07-08T09:55:31Z) - Model-Free $\mu$-Synthesis: A Nonsmooth Optimization Perspective [4.477225073240389]
In this paper, we revisit an important policy search benchmark, namely $mu$-synthesis.
We show that subgradient-based search methods have led to impressive numerical results in practice.
arXiv Detail & Related papers (2024-02-18T17:17:17Z) - Utility Fairness in Contextual Dynamic Pricing with Demand Learning [23.26236046836737]
This paper introduces a novel contextual bandit algorithm for personalized pricing under utility fairness constraints.
Our approach, which incorporates dynamic pricing and demand learning, addresses the critical challenge of fairness in pricing strategies.
arXiv Detail & Related papers (2023-11-28T05:19:23Z) - Contextual Dynamic Pricing with Strategic Buyers [93.97401997137564]
We study the contextual dynamic pricing problem with strategic buyers.
Seller does not observe the buyer's true feature, but a manipulated feature according to buyers' strategic behavior.
We propose a strategic dynamic pricing policy that incorporates the buyers' strategic behavior into the online learning to maximize the seller's cumulative revenue.
arXiv Detail & Related papers (2023-07-08T23:06:42Z) - Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model [50.06663781566795]
We consider a dynamic model with the consumers' preferences as well as price sensitivity varying over time.
We measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance.
Our regret analysis results not only demonstrate optimality of the proposed policy but also show that for policy planning it is essential to incorporate available structural information.
arXiv Detail & Related papers (2023-03-28T00:23:23Z) - Personalized Pricing with Invalid Instrumental Variables:
Identification, Estimation, and Policy Learning [5.372349090093469]
This work studies offline personalized pricing under endogeneity using an instrumental variable approach.
We propose a new policy learning method for Personalized pRicing using Invalid iNsTrumental variables.
arXiv Detail & Related papers (2023-02-24T14:50:47Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Offline Reinforcement Learning with Instrumental Variables in Confounded
Markov Decision Processes [93.61202366677526]
We study the offline reinforcement learning (RL) in the face of unmeasured confounders.
We propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy.
arXiv Detail & Related papers (2022-09-18T22:03:55Z) - Conformal Off-Policy Prediction in Contextual Bandits [54.67508891852636]
Conformal off-policy prediction can output reliable predictive intervals for the outcome under a new target policy.
We provide theoretical finite-sample guarantees without making any additional assumptions beyond the standard contextual bandit setup.
arXiv Detail & Related papers (2022-06-09T10:39:33Z) - Online Regularization towards Always-Valid High-Dimensional Dynamic
Pricing [19.11333865618553]
We propose a novel approach for designing dynamic pricing policy based regularized online statistical learning with theoretical guarantees.
Our proposed online regularization scheme equips the proposed optimistic online regularized maximum likelihood pricing (OORMLP) pricing policy with three major advantages.
In theory, the proposed OORMLP algorithm exploits the sparsity structure of high-dimensional models and secures a logarithmic regret in a decision horizon.
arXiv Detail & Related papers (2020-07-05T23:52:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.