Learning an Inventory Control Policy with General Inventory Arrival
Dynamics
- URL: http://arxiv.org/abs/2310.17168v2
- Date: Mon, 22 Jan 2024 00:12:20 GMT
- Title: Learning an Inventory Control Policy with General Inventory Arrival
Dynamics
- Authors: Sohrab Andaz, Carson Eisenach, Dhruv Madeka, Kari Torkkola, Randy Jia,
Dean Foster, Sham Kakade
- Abstract summary: This paper addresses the problem of learning and backtesting inventory control policies in the presence of general arrival dynamics.
To the best of our knowledge this is the first work to handle either arbitrary arrival dynamics or an arbitrary downstream post-processing of order quantities.
- Score: 2.3715198714015893
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we address the problem of learning and backtesting inventory
control policies in the presence of general arrival dynamics -- which we term
as a quantity-over-time arrivals model (QOT). We also allow for order
quantities to be modified as a post-processing step to meet vendor constraints
such as order minimum and batch size constraints -- a common practice in real
supply chains. To the best of our knowledge this is the first work to handle
either arbitrary arrival dynamics or an arbitrary downstream post-processing of
order quantities. Building upon recent work (Madeka et al., 2022) we similarly
formulate the periodic review inventory control problem as an exogenous
decision process, where most of the state is outside the control of the agent.
Madeka et al., 2022 show how to construct a simulator that replays historic
data to solve this class of problem. In our case, we incorporate a deep
generative model for the arrivals process as part of the history replay. By
formulating the problem as an exogenous decision process, we can apply results
from Madeka et al., 2022 to obtain a reduction to supervised learning. Via
simulation studies we show that this approach yields statistically significant
improvements in profitability over production baselines. Using data from a
real-world A/B test, we show that Gen-QOT generalizes well to off-policy data
and that the resulting buying policy outperforms traditional inventory
management systems in real world settings.
Related papers
- Neural Coordination and Capacity Control for Inventory Management [4.533373101620897]
This paper is motivated by the questions of what does it mean to backtest a capacity control mechanism and can we devise and backtest a capacity control mechanism compatible with recent advances in deep reinforcement learning for inventory management?
First, because we only have a single historic sample path of Amazon's capacity limits, we propose a method that samples from a distribution of possible constraint paths covering a space of real-world scenarios.
Second, we extend the exo-IDP (Exogenous Decision Process) formulation of Madeka et al. 2022 to capacitated periodic review inventory control problems and show that certain capacit
arXiv Detail & Related papers (2024-09-24T16:23:10Z) - VC Theory for Inventory Policies [7.71791422193777]
We prove generalization guarantees for learning several well-known classes of inventory policies.
We focus on a classical setting without contexts, but allow for an arbitrary distribution over demand sequences.
Our research suggests situations in which it could be beneficial to incorporate the concepts of base-stock and inventory position into black-box learning machines.
arXiv Detail & Related papers (2024-04-17T16:05:03Z) - Off-Policy Evaluation for Large Action Spaces via Policy Convolution [60.6953713877886]
Policy Convolution family of estimators uses latent structure within actions to strategically convolve the logging and target policies.
Experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC.
arXiv Detail & Related papers (2023-10-24T01:00:01Z) - Online Inventory Problems: Beyond the i.i.d. Setting with Online Convex
Optimization [0.8602553195689513]
We study multi-product inventory control problems where a manager makes sequential replenishment decisions based on partial historical information in order to minimize its cumulative losses.
We propose MaxCOSD, an online algorithm that has provable guarantees even for problems with non-i.i.d. demands and stateful dynamics.
arXiv Detail & Related papers (2023-07-12T10:00:22Z) - Let Offline RL Flow: Training Conservative Agents in the Latent Space of
Normalizing Flows [58.762959061522736]
offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions.
We build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model.
We evaluate our method on various locomotion and navigation tasks, demonstrating that our approach outperforms recently proposed algorithms.
arXiv Detail & Related papers (2022-11-20T21:57:10Z) - Deep Inventory Management [3.578617477295742]
We present a Deep Reinforcement Learning approach to solving a periodic review inventory control system.
We show that several policy learning approaches are competitive with or outperform classical baseline approaches.
arXiv Detail & Related papers (2022-10-06T18:00:25Z) - Stateful Offline Contextual Policy Evaluation and Learning [88.9134799076718]
We study off-policy evaluation and learning from sequential data.
We formalize the relevant causal structure of problems such as dynamic personalized pricing.
We show improved out-of-sample policy performance in this class of relevant problems.
arXiv Detail & Related papers (2021-10-19T16:15:56Z) - Injecting Knowledge in Data-driven Vehicle Trajectory Predictors [82.91398970736391]
Vehicle trajectory prediction tasks have been commonly tackled from two perspectives: knowledge-driven or data-driven.
In this paper, we propose to learn a "Realistic Residual Block" (RRB) which effectively connects these two perspectives.
Our proposed method outputs realistic predictions by confining the residual range and taking into account its uncertainty.
arXiv Detail & Related papers (2021-03-08T16:03:09Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z) - Keep Doing What Worked: Behavioral Modelling Priors for Offline
Reinforcement Learning [25.099754758455415]
Off-policy reinforcement learning algorithms promise to be applicable in settings where only a fixed data-set of environment interactions is available.
Standard off-policy algorithms fail in the batch setting for continuous control.
arXiv Detail & Related papers (2020-02-19T19:21:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.