Structure-Informed Deep Reinforcement Learning for Inventory Management
- URL: http://arxiv.org/abs/2507.22040v1
- Date: Tue, 29 Jul 2025 17:41:45 GMT
- Title: Structure-Informed Deep Reinforcement Learning for Inventory Management
- Authors: Alvaro Maggiar, Sohrab Andaz, Akhil Bagaria, Carson Eisenach, Dean Foster, Omer Gottesman, Dominique Perrault-Joncas,
- Abstract summary: This paper investigates the application of Deep Reinforcement Learning to classical inventory management problems.<n>We apply a DRL algorithm based on DirectBackprop to several fundamental inventory management scenarios.<n>We show that our generic DRL implementation performs competitively against or outperforms established benchmarks and distributions.
- Score: 8.697068617006964
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates the application of Deep Reinforcement Learning (DRL) to classical inventory management problems, with a focus on practical implementation considerations. We apply a DRL algorithm based on DirectBackprop to several fundamental inventory management scenarios including multi-period systems with lost sales (with and without lead times), perishable inventory management, dual sourcing, and joint inventory procurement and removal. The DRL approach learns policies across products using only historical information that would be available in practice, avoiding unrealistic assumptions about demand distributions or access to distribution parameters. We demonstrate that our generic DRL implementation performs competitively against or outperforms established benchmarks and heuristics across these diverse settings, while requiring minimal parameter tuning. Through examination of the learned policies, we show that the DRL approach naturally captures many known structural properties of optimal policies derived from traditional operations research methods. To further improve policy performance and interpretability, we propose a Structure-Informed Policy Network technique that explicitly incorporates analytically-derived characteristics of optimal policies into the learning process. This approach can help interpretability and add robustness to the policy in out-of-sample performance, as we demonstrate in an example with realistic demand data. Finally, we provide an illustrative application of DRL in a non-stationary setting. Our work bridges the gap between data-driven learning and analytical insights in inventory management while maintaining practical applicability.
Related papers
- Observations Meet Actions: Learning Control-Sufficient Representations for Robust Policy Generalization [6.408943565801689]
Capturing latent variations ("contexts") is key to deploying reinforcement-learning (RL) agents beyond their training regime.<n>We recast context-based RL as a dual inference-control problem and formally characterize two properties and their hierarchy.<n>We derive a contextual evidence lower bound(ELBO)-style objective that cleanly separates representation learning from policy learning.
arXiv Detail & Related papers (2025-07-25T17:08:16Z) - VC Theory for Inventory Policies [7.71791422193777]
We prove generalization guarantees for learning several well-known classes of inventory policies.
We focus on a classical setting without contexts, but allow for an arbitrary distribution over demand sequences.
Our research suggests situations in which it could be beneficial to incorporate the concepts of base-stock and inventory position into black-box learning machines.
arXiv Detail & Related papers (2024-04-17T16:05:03Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Deep Inventory Management [3.578617477295742]
We present a Deep Reinforcement Learning approach to solving a periodic review inventory control system.
We show that several policy learning approaches are competitive with or outperform classical baseline approaches.
arXiv Detail & Related papers (2022-10-06T18:00:25Z) - Diffusion Policies as an Expressive Policy Class for Offline
Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset.
We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy.
We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Towards Deployment-Efficient Reinforcement Learning: Lower Bound and
Optimality [141.89413461337324]
Deployment efficiency is an important criterion for many real-world applications of reinforcement learning (RL)
We propose a theoretical formulation for deployment-efficient RL (DE-RL) from an "optimization with constraints" perspective.
arXiv Detail & Related papers (2022-02-14T01:31:46Z) - Deep Policy Iteration with Integer Programming for Inventory Management [8.27175065641495]
We present a framework for optimizing long-term discounted reward problems with large accessible action space and state dependent constraints.<n>Our proposed Programmable Actor Reinforcement Learning (PARL) uses a deep-policy method that leverages neural networks (NNs) to approximate the value function.<n>We benchmark the proposed algorithm against state-of-the-art RL algorithms and commonly used replenishments and find it considerably outperforms existing methods by as much as 14.7% on average.
arXiv Detail & Related papers (2021-12-04T01:40:34Z) - Policy Information Capacity: Information-Theoretic Measure for Task
Complexity in Deep Reinforcement Learning [83.66080019570461]
We propose two environment-agnostic, algorithm-agnostic quantitative metrics for task difficulty.
We show that these metrics have higher correlations with normalized task solvability scores than a variety of alternatives.
These metrics can also be used for fast and compute-efficient optimizations of key design parameters.
arXiv Detail & Related papers (2021-03-23T17:49:50Z) - Deep Controlled Learning for Inventory Control [0.0]
The application of Deep Reinforcement Learning (DRL) to inventory management is an emerging field.<n>Traditional DRL algorithms, originally developed for diverse domains such as game-playing and robotics, may not be well-suited for the specific challenges posed by inventory management.<n>We propose Deep Learning (DCL), a new DRL algorithm designed for highly numerical problems.
arXiv Detail & Related papers (2020-11-30T18:53:08Z) - Conservative Q-Learning for Offline Reinforcement Learning [106.05582605650932]
We show that CQL substantially outperforms existing offline RL methods, often learning policies that attain 2-5 times higher final return.
We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.
arXiv Detail & Related papers (2020-06-08T17:53:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.