Classical and Deep Reinforcement Learning Inventory Control Policies for Pharmaceutical Supply Chains with Perishability and Non-Stationarity
- URL: http://arxiv.org/abs/2501.10895v1
- Date: Sat, 18 Jan 2025 22:40:33 GMT
- Title: Classical and Deep Reinforcement Learning Inventory Control Policies for Pharmaceutical Supply Chains with Perishability and Non-Stationarity
- Authors: Francesco Stranieri, Chaaben Kouki, Willem van Jaarsveld, Fabio Stella,
- Abstract summary: We study inventory control policies for pharmaceutical supply chains, addressing challenges such as perishability, yield uncertainty, and non-stationary demand.<n>We benchmark three policies--order-up-to (OUT), projected inventory level (PIL), and deep reinforcement learning (DRL)
- Score: 1.0124625066746595
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study inventory control policies for pharmaceutical supply chains, addressing challenges such as perishability, yield uncertainty, and non-stationary demand, combined with batching constraints, lead times, and lost sales. Collaborating with Bristol-Myers Squibb (BMS), we develop a realistic case study incorporating these factors and benchmark three policies--order-up-to (OUT), projected inventory level (PIL), and deep reinforcement learning (DRL) using the proximal policy optimization (PPO) algorithm--against a BMS baseline based on human expertise. We derive and validate bounds-based procedures for optimizing OUT and PIL policy parameters and propose a methodology for estimating projected inventory levels, which are also integrated into the DRL policy with demand forecasts to improve decision-making under non-stationarity. Compared to a human-driven policy, which avoids lost sales through higher holding costs, all three implemented policies achieve lower average costs but exhibit greater cost variability. While PIL demonstrates robust and consistent performance, OUT struggles under high lost sales costs, and PPO excels in complex and variable scenarios but requires significant computational effort. The findings suggest that while DRL shows potential, it does not outperform classical policies in all numerical experiments, highlighting 1) the need to integrate diverse policies to manage pharmaceutical challenges effectively, based on the current state-of-the-art, and 2) that practical problems in this domain seem to lack a single policy class that yields universally acceptable performance.
Related papers
- Structure-Informed Deep Reinforcement Learning for Inventory Management [8.697068617006964]
This paper investigates the application of Deep Reinforcement Learning to classical inventory management problems.<n>We apply a DRL algorithm based on DirectBackprop to several fundamental inventory management scenarios.<n>We show that our generic DRL implementation performs competitively against or outperforms established benchmarks and distributions.
arXiv Detail & Related papers (2025-07-29T17:41:45Z) - Adaptive Inventory Strategies using Deep Reinforcement Learning for Dynamic Agri-Food Supply Chains [1.7930468380414317]
This study focuses on inventory management of agri-food products under demand and lead time uncertainties.<n>It proposes a novel Deep Reinforcement Learning (DRL) algorithm that combines the benefits of both value- and policy-based DRL approaches for inventory optimization under uncertainties.
arXiv Detail & Related papers (2025-07-22T15:02:54Z) - Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization [55.97310586039358]
Diffusion models have garnered widespread attention in Reinforcement Learning (RL) for their powerful expressiveness and multimodality.<n>We propose a novel model-free diffusion-based online RL algorithm, Q-weighted Variational Policy Optimization (QVPO)<n>Specifically, we introduce the Q-weighted variational loss, which can be proved to be a tight lower bound of the policy objective in online RL under certain conditions.<n>We also develop an efficient behavior policy to enhance sample efficiency by reducing the variance of the diffusion policy during online interactions.
arXiv Detail & Related papers (2024-05-25T10:45:46Z) - Off-Policy Evaluation for Large Action Spaces via Policy Convolution [60.6953713877886]
Policy Convolution family of estimators uses latent structure within actions to strategically convolve the logging and target policies.
Experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC.
arXiv Detail & Related papers (2023-10-24T01:00:01Z) - PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback [106.63518036538163]
We present a novel unified bilevel optimization-based framework, textsfPARL, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning.
Our framework addressed these concerns by explicitly parameterizing the distribution of the upper alignment objective (reward design) by the lower optimal variable.
Our empirical results substantiate that the proposed textsfPARL can address the alignment concerns in RL by showing significant improvements.
arXiv Detail & Related papers (2023-08-03T18:03:44Z) - Distributional constrained reinforcement learning for supply chain
optimization [0.0]
We introduce Distributional Constrained Policy Optimization (DCPO), a novel approach for reliable constraint satisfaction in reinforcement learning.
We show that DCPO improves the rate at which the RL policy converges and ensures reliable constraint satisfaction by the end of training.
arXiv Detail & Related papers (2023-02-03T13:43:02Z) - COptiDICE: Offline Constrained Reinforcement Learning via Stationary
Distribution Correction Estimation [73.17078343706909]
offline constrained reinforcement learning (RL) problem, in which the agent aims to compute a policy that maximizes expected return while satisfying given cost constraints, learning only from a pre-collected dataset.
We present an offline constrained RL algorithm that optimize the policy in the space of the stationary distribution.
Our algorithm, COptiDICE, directly estimates the stationary distribution corrections of the optimal policy with respect to returns, while constraining the cost upper bound, with the goal of yielding a cost-conservative policy for actual constraint satisfaction.
arXiv Detail & Related papers (2022-04-19T15:55:47Z) - Off-Policy Optimization of Portfolio Allocation Policies under
Constraints [0.8848340429852071]
Dynamic portfolio optimization problem in finance frequently requires learning policies that adhere to various constraints, driven by investor preferences and risk.
We motivate this problem of finding an allocation policy within a sequential decision making framework and study the effects of: (a) using data collected under previously employed policies, which may be sub-optimal and constraint-violating, and (b) imposing desired constraints while computing near-optimal policies with this data.
arXiv Detail & Related papers (2020-12-21T22:22:04Z) - Deep Controlled Learning for Inventory Control [0.0]
Controlled Deep Learning (DCL) is a new DRL framework based on approximate policy specifically designed to tackle inventory problems.
DCL outperforms existing state-of-the-art iterations in lost sales inventory control, perishable inventory systems, and inventory systems with random lead times.
These substantial performance and robustness improvements pave the way for the effective application of tailored DRL algorithms to inventory management problems.
arXiv Detail & Related papers (2020-11-30T18:53:08Z) - CRPO: A New Approach for Safe Reinforcement Learning with Convergence
Guarantee [61.176159046544946]
In safe reinforcement learning (SRL) problems, an agent explores the environment to maximize an expected total reward and avoids violation of certain constraints.
This is the first-time analysis of SRL algorithms with global optimal policies.
arXiv Detail & Related papers (2020-11-11T16:05:14Z) - Projection-Based Constrained Policy Optimization [34.555500347840805]
We propose a new algorithm, Projection-Based Constrained Policy Optimization (PCPO)
PCPO achieves more than 3.5 times less constraint violation and around 15% higher reward compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-10-07T04:22:45Z) - Robust Deep Reinforcement Learning against Adversarial Perturbations on
State Observations [88.94162416324505]
A deep reinforcement learning (DRL) agent observes its states through observations, which may contain natural measurement errors or adversarial noises.
Since the observations deviate from the true states, they can mislead the agent into making suboptimal actions.
We show that naively applying existing techniques on improving robustness for classification tasks, like adversarial training, is ineffective for many RL tasks.
arXiv Detail & Related papers (2020-03-19T17:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.