Non-Stationary Dynamic Pricing Via Actor-Critic Information-Directed
Pricing
- URL: http://arxiv.org/abs/2208.09372v1
- Date: Fri, 19 Aug 2022 14:37:37 GMT
- Title: Non-Stationary Dynamic Pricing Via Actor-Critic Information-Directed
Pricing
- Authors: Po-Yi Liu, Chi-Hua Wang, Heng-Hsui Tsai
- Abstract summary: The proposed ACIDP extends information-directed sampling (IDS) algorithms from statistical machine learning to include microeconomic choice theory.
It outperforms competing bandit algorithms including Upper Confidence Bound (UCB) and Thompson sampling (TS) in a series of market environment shifts.
- Score: 1.4180331276028662
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a novel non-stationary dynamic pricing algorithm design,
where pricing agents face incomplete demand information and market environment
shifts. The agents run price experiments to learn about each product's demand
curve and the profit-maximizing price, while being aware of market environment
shifts to avoid high opportunity costs from offering sub-optimal prices. The
proposed ACIDP extends information-directed sampling (IDS) algorithms from
statistical machine learning to include microeconomic choice theory, with a
novel pricing strategy auditing procedure to escape sub-optimal pricing after
market environment shift. The proposed ACIDP outperforms competing bandit
algorithms including Upper Confidence Bound (UCB) and Thompson sampling (TS) in
a series of market environment shifts.
Related papers
- Learn to Bid as a Price-Maker Wind Power Producer [2.249916681499244]
Wind power producers (WPPs) participating in short-term power markets face significant imbalance costs due to their non-dispatchable and variable production.
We propose an online learning algorithm that leverages contextual information to optimize WPP bids in the price-maker setting.
The algorithm's performance is evaluated against various benchmark strategies using a numerical simulation of the German day-ahead and real-time markets.
arXiv Detail & Related papers (2025-03-20T12:51:37Z) - Transfer Learning for Nonparametric Contextual Dynamic Pricing [17.420508136662257]
Dynamic pricing strategies are crucial for firms to maximize revenue by adjusting prices based on market conditions and customer characteristics.
One promising approach to overcome this limitation is to leverage information from related products or markets to inform the focal pricing decisions.
We propose a novel Transfer Learning for Dynamic Pricing (TLDP) algorithm that can effectively leverage pre-collected data from a source domain to enhance pricing decisions in the target domain.
arXiv Detail & Related papers (2025-01-31T01:05:04Z) - A Tale of Two Cities: Pessimism and Opportunism in Offline Dynamic Pricing [20.06425698412548]
This paper studies offline dynamic pricing without data coverage assumption.
We establish a partial identification bound for the demand parameter whose associated price is unobserved.
We incorporate pessimistic and opportunistic strategies within the proposed partial identification framework to derive the estimated policy.
arXiv Detail & Related papers (2024-11-12T19:09:41Z) - A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints [54.46126953873298]
We address the problem of dynamically pricing complementary items that are sequentially displayed to customers.
Coherent pricing policies for complementary items are essential because optimizing the pricing of each item individually is ineffective.
We empirically evaluate our approach using synthetic settings randomly generated from real-world data, and compare its performance in terms of constraints violation and regret.
arXiv Detail & Related papers (2024-07-08T09:55:31Z) - By Fair Means or Foul: Quantifying Collusion in a Market Simulation with Deep Reinforcement Learning [1.5249435285717095]
This research employs an experimental oligopoly model of repeated price competition.
We investigate the strategies and emerging pricing patterns developed by the agents, which may lead to a collusive outcome.
Our findings indicate that RL-based AI agents converge to a collusive state characterized by the charging of supracompetitive prices.
arXiv Detail & Related papers (2024-06-04T15:35:08Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Diffusion Variational Autoencoder for Tackling Stochasticity in
Multi-Step Regression Stock Price Prediction [54.21695754082441]
Multi-step stock price prediction over a long-term horizon is crucial for forecasting its volatility.
Current solutions to multi-step stock price prediction are mostly designed for single-step, classification-based predictions.
We combine a deep hierarchical variational-autoencoder (VAE) and diffusion probabilistic techniques to do seq2seq stock prediction.
Our model is shown to outperform state-of-the-art solutions in terms of its prediction accuracy and variance.
arXiv Detail & Related papers (2023-08-18T16:21:15Z) - Insurance pricing on price comparison websites via reinforcement
learning [7.023335262537794]
This paper introduces reinforcement learning framework that learns optimal pricing policy by integrating model-based and model-free methods.
The paper also highlights the importance of evaluating pricing policies using an offline dataset in a consistent fashion.
arXiv Detail & Related papers (2023-08-14T04:44:56Z) - Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model [50.06663781566795]
We consider a dynamic model with the consumers' preferences as well as price sensitivity varying over time.
We measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance.
Our regret analysis results not only demonstrate optimality of the proposed policy but also show that for policy planning it is essential to incorporate available structural information.
arXiv Detail & Related papers (2023-03-28T00:23:23Z) - Adaptive Risk-Aware Bidding with Budget Constraint in Display
Advertising [47.14651340748015]
We propose a novel adaptive risk-aware bidding algorithm with budget constraint via reinforcement learning.
We theoretically unveil the intrinsic relation between the uncertainty and the risk tendency based on value at risk (VaR)
arXiv Detail & Related papers (2022-12-06T18:50:09Z) - Multi-Asset Spot and Option Market Simulation [52.77024349608834]
We construct realistic spot and equity option market simulators for a single underlying on the basis of normalizing flows.
We leverage the conditional invertibility property of normalizing flows and introduce a scalable method to calibrate the joint distribution of a set of independent simulators.
arXiv Detail & Related papers (2021-12-13T17:34:28Z) - Machine Learning-Driven Virtual Bidding with Electricity Market
Efficiency Analysis [7.014324899009043]
This paper develops a machine learning-driven portfolio optimization framework for virtual bidding in electricity markets.
We leverage the proposed algorithmic virtual bid trading strategy to evaluate both the profitability of the virtual bid portfolio and the efficiency of U.S. wholesale electricity markets.
arXiv Detail & Related papers (2021-04-06T19:30:39Z) - Online Regularization towards Always-Valid High-Dimensional Dynamic
Pricing [19.11333865618553]
We propose a novel approach for designing dynamic pricing policy based regularized online statistical learning with theoretical guarantees.
Our proposed online regularization scheme equips the proposed optimistic online regularized maximum likelihood pricing (OORMLP) pricing policy with three major advantages.
In theory, the proposed OORMLP algorithm exploits the sparsity structure of high-dimensional models and secures a logarithmic regret in a decision horizon.
arXiv Detail & Related papers (2020-07-05T23:52:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.