Self-adapting Robustness in Demand Learning
- URL: http://arxiv.org/abs/2011.10690v1
- Date: Sat, 21 Nov 2020 01:15:54 GMT
- Title: Self-adapting Robustness in Demand Learning
- Authors: Boxiao Chen, Selvaprabu Nadarajah, Parshan Pakiman, Stefanus Jasin
- Abstract summary: We study dynamic pricing over a finite number of periods in the presence of demand model ambiguity.
We develop an adaptively-robust-learning (ARL) pricing policy that learns the true model parameters from the data.
We characterize the behavior of ARL's self-adapting ambiguity sets and derive a regret bound that highlights the link between the scale of revenue loss and the customer arrival pattern.
- Score: 1.949912057689623
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We study dynamic pricing over a finite number of periods in the presence of
demand model ambiguity. Departing from the typical no-regret learning
environment, where price changes are allowed at any time, pricing decisions are
made at pre-specified points in time and each price can be applied to a large
number of arrivals. In this environment, which arises in retailing, a pricing
decision based on an incorrect demand model can significantly impact cumulative
revenue. We develop an adaptively-robust-learning (ARL) pricing policy that
learns the true model parameters from the data while actively managing demand
model ambiguity. It optimizes an objective that is robust with respect to a
self-adapting set of demand models, where a given model is included in this set
only if the sales data revealed from prior pricing decisions makes it
"probable". As a result, it gracefully transitions from being robust when
demand model ambiguity is high to minimizing regret when this ambiguity
diminishes upon receiving more data. We characterize the stochastic behavior of
ARL's self-adapting ambiguity sets and derive a regret bound that highlights
the link between the scale of revenue loss and the customer arrival pattern. We
also show that ARL, by being conscious of both model ambiguity and revenue,
bridges the gap between a distributionally robust policy and a
follow-the-leader policy, which focus on model ambiguity and revenue,
respectively. We numerically find that the ARL policy, or its extension
thereof, exhibits superior performance compared to distributionally robust,
follow-the-leader, and upper-confidence-bound policies in terms of expected
revenue and/or value at risk.
Related papers
- Towards Cost Sensitive Decision Making [14.279123976398926]
In this work, we consider RL models that may actively acquire features from the environment to improve the decision quality and certainty.
We propose the Active-Acquisition POMDP and identify two types of the acquisition process for different application domains.
In order to assist the agent in the actively-acquired partially-observed environment and alleviate the exploration-exploitation dilemma, we develop a model-based approach.
arXiv Detail & Related papers (2024-10-04T19:48:23Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Dual policy as self-model for planning [71.73710074424511]
We refer to the model used to simulate one's decisions as the agent's self-model.
Inspired by current reinforcement learning approaches and neuroscience, we explore the benefits and limitations of using a distilled policy network as the self-model.
arXiv Detail & Related papers (2023-06-07T13:58:45Z) - Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model [50.06663781566795]
We consider a dynamic model with the consumers' preferences as well as price sensitivity varying over time.
We measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance.
Our regret analysis results not only demonstrate optimality of the proposed policy but also show that for policy planning it is essential to incorporate available structural information.
arXiv Detail & Related papers (2023-03-28T00:23:23Z) - Personalized Pricing with Invalid Instrumental Variables:
Identification, Estimation, and Policy Learning [5.372349090093469]
This work studies offline personalized pricing under endogeneity using an instrumental variable approach.
We propose a new policy learning method for Personalized pRicing using Invalid iNsTrumental variables.
arXiv Detail & Related papers (2023-02-24T14:50:47Z) - Conservative Bayesian Model-Based Value Expansion for Offline Policy
Optimization [41.774837419584735]
offline reinforcement learning (RL) addresses the problem of learning a performant policy from a fixed batch of data collected by following some behavior policy.
Model-based approaches are particularly appealing since they can extract more learning signals from the logged dataset by learning a model of the environment.
arXiv Detail & Related papers (2022-10-07T20:13:50Z) - Model-Augmented Q-learning [112.86795579978802]
We propose a MFRL framework that is augmented with the components of model-based RL.
Specifically, we propose to estimate not only the $Q$-values but also both the transition and the reward with a shared network.
We show that the proposed scheme, called Model-augmented $Q$-learning (MQL), obtains a policy-invariant solution which is identical to the solution obtained by learning with true reward.
arXiv Detail & Related papers (2021-02-07T17:56:50Z) - On the model-based stochastic value gradient for continuous
reinforcement learning [50.085645237597056]
We show that simple model-based agents can outperform state-of-the-art model-free agents in terms of both sample-efficiency and final reward.
Our findings suggest that model-based policy evaluation deserves closer attention.
arXiv Detail & Related papers (2020-08-28T17:58:29Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z) - Uncertainty Quantification for Demand Prediction in Contextual Dynamic
Pricing [20.828160401904697]
We study the problem of constructing accurate confidence intervals for the demand function.
We develop a debiased approach and provide the normality guarantee of the debiased estimator.
arXiv Detail & Related papers (2020-03-16T04:21:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.