Related papers: Learning with Posterior Sampling for Revenue Management under Time-varying Demand

Learning with Posterior Sampling for Revenue Management under Time-varying Demand

URL: http://arxiv.org/abs/2405.04910v1
Date: Wed, 8 May 2024 09:28:26 GMT
Title: Learning with Posterior Sampling for Revenue Management under Time-varying Demand
Authors: Kazuma Shimizu, Junya Honda, Shinji Ito, Shinji Nakadai,
Abstract summary: We discuss the revenue management problem to maximize revenue by pricing items or services. One challenge in this problem is that the demand distribution is unknown and varies over time in real applications such as airline and retail industries.
Score: 36.22276574805786
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This paper discusses the revenue management (RM) problem to maximize revenue by pricing items or services. One challenge in this problem is that the demand distribution is unknown and varies over time in real applications such as airline and retail industries. In particular, the time-varying demand has not been well studied under scenarios of unknown demand due to the difficulty of jointly managing the remaining inventory and estimating the demand. To tackle this challenge, we first introduce an episodic generalization of the RM problem motivated by typical application scenarios. We then propose a computationally efficient algorithm based on posterior sampling, which effectively optimizes prices by solving linear programming. We derive a Bayesian regret upper bound of this algorithm for general models where demand parameters can be correlated between time periods, while also deriving a regret lower bound for generic algorithms. Our empirical study shows that the proposed algorithm performs better than other benchmark algorithms and comparably to the optimal policy in hindsight. We also propose a heuristic modification of the proposed algorithm, which further efficiently learns the pricing policy in the experiments.

Related papers

Offline Dynamic Inventory and Pricing Strategy: Addressing Censored and Dependent Demand [7.289672463326423]
We study the offline sequential feature-based pricing and inventory control problem. Our goal is to leverage the offline dataset to estimate the optimal pricing and inventory control policy.
arXiv Detail & Related papers (2025-04-14T02:57:51Z)
Epoch-based Application of Problem-Aware Operators in a Multiobjective Memetic Algorithm for Portfolio Optimization [0.0]
We consider the issue of intensification/diversification balance in the context of a memetic algorithm for the multiobjective optimization of investment portfolios with cardinality constraints. We have conducted a sensibility analysis to determine in which phases of the search the application of these operators leads to better results. Our findings indicate that the resulting algorithm is quite robust in terms of parameterization from the point of view of this problem-specific indicator.
arXiv Detail & Related papers (2024-12-05T08:57:42Z)
Deep Generative Demand Learning for Newsvendor and Pricing [7.594251468240168]
We consider data-driven inventory and pricing decisions in the feature-based newsvendor problem. We propose a novel approach leveraging conditional deep generative models (cDGMs) to address these challenges. We provide theoretical guarantees for our approach, including the consistency of profit estimation and convergence of our decisions to the optimal solution.
arXiv Detail & Related papers (2024-11-13T14:17:26Z)
Contractual Reinforcement Learning: Pulling Arms with Invisible Hands [68.77645200579181]
We propose a theoretical framework for aligning economic interests of different stakeholders in the online learning problems through contract design. For the planning problem, we design an efficient dynamic programming algorithm to determine the optimal contracts against the far-sighted agent. For the learning problem, we introduce a generic design of no-regret learning algorithms to untangle the challenges from robust design of contracts to the balance of exploration and exploitation.
arXiv Detail & Related papers (2024-07-01T16:53:00Z)
High-dimensional Contextual Bandit Problem without Sparsity [8.782204980889077]
We propose an explore-then-commit (EtC) algorithm to address this problem and examine its performance. We derive the optimal rate of the ETC algorithm in terms of $T$ and show that this rate can be achieved by balancing exploration and exploitation. We introduce an adaptive explore-then-commit (AEtC) algorithm that adaptively finds the optimal balance.
arXiv Detail & Related papers (2023-06-19T15:29:32Z)
Regret Bounds for Expected Improvement Algorithms in Gaussian Process Bandit Optimization [63.8557841188626]
The expected improvement (EI) algorithm is one of the most popular strategies for optimization under uncertainty. We propose a variant of EI with a standard incumbent defined via the GP predictive mean. We show that our algorithm converges, and achieves a cumulative regret bound of $mathcal O(gamma_TsqrtT)$.
arXiv Detail & Related papers (2022-03-15T13:17:53Z)
Online Allocation with Two-sided Resource Constraints [44.5635910908944]
We consider an online allocation problem subject to lower and upper resource constraints, where the requests arrive sequentially. We propose a new algorithm that obtains $1-O(fracepsilonalpha-epsilon)$ -competitive ratio for the offline problems that know the entire requests ahead of time.
arXiv Detail & Related papers (2021-12-28T02:21:06Z)
Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes. Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z)
Regularized Online Allocation Problems: Fairness and Beyond [7.433931244705934]
We introduce the emphregularized online allocation problem, a variant that includes a non-linear regularizer acting on the total resource consumption. In this problem, requests repeatedly arrive over time and, for each request, a decision maker needs to take an action that generates a reward and consumes resources. The objective is to simultaneously maximize additively separable rewards and the value of a non-separable regularizer subject to the resource constraints.
arXiv Detail & Related papers (2020-07-01T14:24:58Z)
Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation [49.69139684065241]
Contextual multi-armed bandit (MAB) achieves cutting-edge performance on a variety of problems. In this paper, we propose a hierarchical adaptive contextual bandit method (HATCH) to conduct the policy learning of contextual bandits with a budget constraint.
arXiv Detail & Related papers (2020-04-02T17:04:52Z)
Uncertainty Quantification for Demand Prediction in Contextual Dynamic Pricing [20.828160401904697]
We study the problem of constructing accurate confidence intervals for the demand function. We develop a debiased approach and provide the normality guarantee of the debiased estimator.
arXiv Detail & Related papers (2020-03-16T04:21:58Z)
Active Model Estimation in Markov Decision Processes [108.46146218973189]
We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP) We show that our Markov-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime.
arXiv Detail & Related papers (2020-03-06T16:17:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.