Neural Coordination and Capacity Control for Inventory Management
- URL: http://arxiv.org/abs/2410.02817v1
- Date: Tue, 24 Sep 2024 16:23:10 GMT
- Title: Neural Coordination and Capacity Control for Inventory Management
- Authors: Carson Eisenach, Udaya Ghai, Dhruv Madeka, Kari Torkkola, Dean Foster, Sham Kakade,
- Abstract summary: This paper is motivated by the questions of what does it mean to backtest a capacity control mechanism and can we devise and backtest a capacity control mechanism compatible with recent advances in deep reinforcement learning for inventory management?
First, because we only have a single historic sample path of Amazon's capacity limits, we propose a method that samples from a distribution of possible constraint paths covering a space of real-world scenarios.
Second, we extend the exo-IDP (Exogenous Decision Process) formulation of Madeka et al. 2022 to capacitated periodic review inventory control problems and show that certain capacit
- Score: 4.533373101620897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the capacitated periodic review inventory control problem, focusing on a retailer managing multiple products with limited shared resources, such as storage or inbound labor at a facility. Specifically, this paper is motivated by the questions of (1) what does it mean to backtest a capacity control mechanism, (2) can we devise and backtest a capacity control mechanism that is compatible with recent advances in deep reinforcement learning for inventory management? First, because we only have a single historic sample path of Amazon's capacity limits, we propose a method that samples from a distribution of possible constraint paths covering a space of real-world scenarios. This novel approach allows for more robust and realistic testing of inventory management strategies. Second, we extend the exo-IDP (Exogenous Decision Process) formulation of Madeka et al. 2022 to capacitated periodic review inventory control problems and show that certain capacitated control problems are no harder than supervised learning. Third, we introduce a `neural coordinator', designed to produce forecasts of capacity prices, guiding the system to adhere to target constraints in place of a traditional model predictive controller. Finally, we apply a modified DirectBackprop algorithm for learning a deep RL buying policy and a training the neural coordinator. Our methodology is evaluated through large-scale backtests, demonstrating RL buying policies with a neural coordinator outperforms classic baselines both in terms of cumulative discounted reward and capacity adherence (we see improvements of up to 50% in some cases).
Related papers
- Zero-shot Generalization in Inventory Management: Train, then Estimate and Decide [0.0]
Deploying deep reinforcement learning (DRL) in real-world inventory management presents challenges.
These challenges highlight a research gap, suggesting a need for a unifying framework to model and solve sequential decision-making under parameter uncertainty.
We address this by exploring an underexplored area of DRL for inventory management: training generally capable agents (GCAs) under zero-shot generalization (ZSG)
arXiv Detail & Related papers (2024-11-01T11:20:05Z) - Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution [51.83951489847344]
In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency.
In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution.
Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.
arXiv Detail & Related papers (2024-04-05T17:58:37Z) - Learning an Inventory Control Policy with General Inventory Arrival
Dynamics [2.3715198714015893]
This paper addresses the problem of learning and backtesting inventory control policies in the presence of general arrival dynamics.
To the best of our knowledge this is the first work to handle either arbitrary arrival dynamics or an arbitrary downstream post-processing of order quantities.
arXiv Detail & Related papers (2023-10-26T05:49:13Z) - Multi-Agent Reinforcement Learning with Shared Resources for Inventory
Management [62.23979094308932]
In our setting, the constraint on the shared resources (such as the inventory capacity) couples the otherwise independent control for each SKU.
We formulate the problem with this structure as Shared-Resource Game (SRSG)and propose an efficient algorithm called Context-aware Decentralized PPO (CD-PPO)
Through extensive experiments, we demonstrate that CD-PPO can accelerate the learning procedure compared with standard MARL algorithms.
arXiv Detail & Related papers (2022-12-15T09:35:54Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Control of Dual-Sourcing Inventory Systems using Recurrent Neural
Networks [0.0]
We show that proposed neural network controllers (NNCs) are able to learn near-optimal policies of commonly used instances within a few minutes of CPU time.
Our research opens up new ways of efficiently managing complex, high-dimensional inventory dynamics.
arXiv Detail & Related papers (2022-01-16T19:44:06Z) - URLB: Unsupervised Reinforcement Learning Benchmark [82.36060735454647]
We introduce the Unsupervised Reinforcement Learning Benchmark (URLB)
URLB consists of two phases: reward-free pre-training and downstream task adaptation with extrinsic rewards.
We provide twelve continuous control tasks from three domains for evaluation and open-source code for eight leading unsupervised RL methods.
arXiv Detail & Related papers (2021-10-28T15:07:01Z) - Residual Feedback Learning for Contact-Rich Manipulation Tasks with
Uncertainty [22.276925045008788]
emphglsrpl offers a formulation to improve existing controllers with reinforcement learning (RL)
We show superior performance of our approach on a contact-rich peg-insertion task under position and orientation uncertainty.
arXiv Detail & Related papers (2021-06-08T13:06:35Z) - Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem.
We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z) - Deep Controlled Learning for Inventory Control [0.0]
Controlled Deep Learning (DCL) is a new DRL framework based on approximate policy specifically designed to tackle inventory problems.
DCL outperforms existing state-of-the-art iterations in lost sales inventory control, perishable inventory systems, and inventory systems with random lead times.
These substantial performance and robustness improvements pave the way for the effective application of tailored DRL algorithms to inventory management problems.
arXiv Detail & Related papers (2020-11-30T18:53:08Z) - Conservative Q-Learning for Offline Reinforcement Learning [106.05582605650932]
We show that CQL substantially outperforms existing offline RL methods, often learning policies that attain 2-5 times higher final return.
We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.
arXiv Detail & Related papers (2020-06-08T17:53:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.