Online Regularization towards Always-Valid High-Dimensional Dynamic
Pricing
- URL: http://arxiv.org/abs/2007.02470v3
- Date: Mon, 20 Nov 2023 19:43:58 GMT
- Title: Online Regularization towards Always-Valid High-Dimensional Dynamic
Pricing
- Authors: Chi-Hua Wang, Zhanyu Wang, Will Wei Sun, Guang Cheng
- Abstract summary: We propose a novel approach for designing dynamic pricing policy based regularized online statistical learning with theoretical guarantees.
Our proposed online regularization scheme equips the proposed optimistic online regularized maximum likelihood pricing (OORMLP) pricing policy with three major advantages.
In theory, the proposed OORMLP algorithm exploits the sparsity structure of high-dimensional models and secures a logarithmic regret in a decision horizon.
- Score: 19.11333865618553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Devising dynamic pricing policy with always valid online statistical learning
procedure is an important and as yet unresolved problem. Most existing dynamic
pricing policy, which focus on the faithfulness of adopted customer choice
models, exhibit a limited capability for adapting the online uncertainty of
learned statistical model during pricing process. In this paper, we propose a
novel approach for designing dynamic pricing policy based regularized online
statistical learning with theoretical guarantees. The new approach overcomes
the challenge of continuous monitoring of online Lasso procedure and possesses
several appealing properties. In particular, we make the decisive observation
that the always-validity of pricing decisions builds and thrives on the online
regularization scheme. Our proposed online regularization scheme equips the
proposed optimistic online regularized maximum likelihood pricing (OORMLP)
pricing policy with three major advantages: encode market noise knowledge into
pricing process optimism; empower online statistical learning with
always-validity over all decision points; envelop prediction error process with
time-uniform non-asymptotic oracle inequalities. This type of non-asymptotic
inference results allows us to design more sample-efficient and robust dynamic
pricing algorithms in practice. In theory, the proposed OORMLP algorithm
exploits the sparsity structure of high-dimensional models and secures a
logarithmic regret in a decision horizon. These theoretical advances are made
possible by proposing an optimistic online Lasso procedure that resolves
dynamic pricing problems at the process level, based on a novel use of
non-asymptotic martingale concentration. In experiments, we evaluate OORMLP in
different synthetic and real pricing problem settings, and demonstrate that
OORMLP advances the state-of-the-art methods.
Related papers
- A Tale of Two Cities: Pessimism and Opportunism in Offline Dynamic Pricing [20.06425698412548]
This paper studies offline dynamic pricing without data coverage assumption.
We establish a partial identification bound for the demand parameter whose associated price is unobserved.
We incorporate pessimistic and opportunistic strategies within the proposed partial identification framework to derive the estimated policy.
arXiv Detail & Related papers (2024-11-12T19:09:41Z) - A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints [54.46126953873298]
We address the problem of dynamically pricing complementary items that are sequentially displayed to customers.
Coherent pricing policies for complementary items are essential because optimizing the pricing of each item individually is ineffective.
We empirically evaluate our approach using synthetic settings randomly generated from real-world data, and compare its performance in terms of constraints violation and regret.
arXiv Detail & Related papers (2024-07-08T09:55:31Z) - Utility Fairness in Contextual Dynamic Pricing with Demand Learning [23.26236046836737]
This paper introduces a novel contextual bandit algorithm for personalized pricing under utility fairness constraints.
Our approach, which incorporates dynamic pricing and demand learning, addresses the critical challenge of fairness in pricing strategies.
arXiv Detail & Related papers (2023-11-28T05:19:23Z) - Insurance pricing on price comparison websites via reinforcement
learning [7.023335262537794]
This paper introduces reinforcement learning framework that learns optimal pricing policy by integrating model-based and model-free methods.
The paper also highlights the importance of evaluating pricing policies using an offline dataset in a consistent fashion.
arXiv Detail & Related papers (2023-08-14T04:44:56Z) - Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model [50.06663781566795]
We consider a dynamic model with the consumers' preferences as well as price sensitivity varying over time.
We measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance.
Our regret analysis results not only demonstrate optimality of the proposed policy but also show that for policy planning it is essential to incorporate available structural information.
arXiv Detail & Related papers (2023-03-28T00:23:23Z) - Personalized Pricing with Invalid Instrumental Variables:
Identification, Estimation, and Policy Learning [5.372349090093469]
This work studies offline personalized pricing under endogeneity using an instrumental variable approach.
We propose a new policy learning method for Personalized pRicing using Invalid iNsTrumental variables.
arXiv Detail & Related papers (2023-02-24T14:50:47Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - On Parametric Optimal Execution and Machine Learning Surrogates [3.077531983369872]
We investigate optimal order execution problems in discrete time with instantaneous price impact and resilience.
We develop a numerical algorithm based on dynamic programming and deep learning.
arXiv Detail & Related papers (2022-04-18T22:40:14Z) - Bayesian Bilinear Neural Network for Predicting the Mid-price Dynamics
in Limit-Order Book Markets [84.90242084523565]
Traditional time-series econometric methods often appear incapable of capturing the true complexity of the multi-level interactions driving the price dynamics.
By adopting a state-of-the-art second-order optimization algorithm, we train a Bayesian bilinear neural network with temporal attention.
By addressing the use of predictive distributions to analyze errors and uncertainties associated with the estimated parameters and model forecasts, we thoroughly compare our Bayesian model with traditional ML alternatives.
arXiv Detail & Related papers (2022-03-07T18:59:54Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - Non-stationary Online Learning with Memory and Non-stochastic Control [71.14503310914799]
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions.
In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments.
We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length.
arXiv Detail & Related papers (2021-02-07T09:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.