Related papers: Deep RL Dual Sourcing Inventory Management with Supply and Capacity Risk Awareness

Deep RL Dual Sourcing Inventory Management with Supply and Capacity Risk Awareness

URL: http://arxiv.org/abs/2507.14446v3
Date: Sat, 26 Jul 2025 01:11:28 GMT
Title: Deep RL Dual Sourcing Inventory Management with Supply and Capacity Risk Awareness
Authors: Defeng Liu, Ying Liu, Carson Eisenach,
Abstract summary: We study how to efficiently apply reinforcement learning (RL) for solving large-scale optimization problems by leveraging intervention models.<n>We demonstrate our approach on a challenging real-world application, the multi-sourcing multi-period inventory management problem in supply chain optimization.
Score: 4.583289433858458
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we study how to efficiently apply reinforcement learning (RL) for solving large-scale stochastic optimization problems by leveraging intervention models. The key of the proposed methodology is to better explore the solution space by simulating and composing the stochastic processes using pre-trained deep learning (DL) models. We demonstrate our approach on a challenging real-world application, the multi-sourcing multi-period inventory management problem in supply chain optimization. In particular, we employ deep RL models for learning and forecasting the stochastic supply chain processes under a range of assumptions. Moreover, we also introduce a constraint coordination mechanism, designed to forecast dual costs given the cross-products constraints in the inventory network. We highlight that instead of directly modeling the complex physical constraints into the RL optimization problem and solving the stochastic problem as a whole, our approach breaks down those supply chain processes into scalable and composable DL modules, leading to improved performance on large real-world datasets. We also outline open problems for future research to further investigate the efficacy of such models.

Related papers

Preference Optimization for Combinatorial Optimization Problems [54.87466279363487]
Reinforcement Learning (RL) has emerged as a powerful tool for neural optimization, enabling models learns that solve complex problems without requiring expert knowledge.<n>Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast action spaces.<n>We propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling.
arXiv Detail & Related papers (2025-05-13T16:47:00Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
Iterative Multi-Agent Reinforcement Learning: A Novel Approach Toward Real-World Multi-Echelon Inventory Optimization [0.6990493129893112]
Multi-echelon inventory optimization (MEIO) is critical for effective supply chain management, but its inherent complexity can pose significant challenges.<n>Recent research has found deep reinforcement learning (DRL) to be a promising alternative to traditional reinforcement learning.<n>This thesis investigates DRL's applicability to MEIO problems of increasing complexity.
arXiv Detail & Related papers (2025-03-23T20:52:21Z)
Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy.<n>The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms.<n>We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z)
Distributionally Robust Model-based Reinforcement Learning with Large State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment. We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets. We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z)
POMDP inference and robust solution via deep reinforcement learning: An application to railway optimal maintenance [0.7046417074932257]
We propose a combined framework for inference and robust solution of POMDPs via deep RL. First, all transition and observation model parameters are jointly inferred via Markov Chain Monte Carlo sampling of a hidden Markov model. The POMDP with uncertain parameters is then solved via deep RL techniques with the parameter distributions incorporated into the solution via domain randomization.
arXiv Detail & Related papers (2023-07-16T15:44:58Z)
Learning a model is paramount for sample efficiency in reinforcement learning control of PDEs [5.488334211013093]
We show that learning an actuated model in parallel to training the RL agent significantly reduces the total amount of required data sampled from the real system. We also show that iteratively updating the model is of major importance to avoid biases in the RL training.
arXiv Detail & Related papers (2023-02-14T16:14:39Z)
Product Segmentation Newsvendor Problems: A Robust Learning Approach [6.346881818701668]
Product segmentation newsvendor problem is a new variant of the newsvendor problem. We propose a new paradigm termed robust learning to increase the attractiveness of robust policies.
arXiv Detail & Related papers (2022-07-08T10:13:10Z)
Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality [141.89413461337324]
Deployment efficiency is an important criterion for many real-world applications of reinforcement learning (RL) We propose a theoretical formulation for deployment-efficient RL (DE-RL) from an "optimization with constraints" perspective.
arXiv Detail & Related papers (2022-02-14T01:31:46Z)
Deep Policy Iteration with Integer Programming for Inventory Management [8.27175065641495]
We present a framework for optimizing long-term discounted reward problems with large accessible action space and state dependent constraints.<n>Our proposed Programmable Actor Reinforcement Learning (PARL) uses a deep-policy method that leverages neural networks (NNs) to approximate the value function.<n>We benchmark the proposed algorithm against state-of-the-art RL algorithms and commonly used replenishments and find it considerably outperforms existing methods by as much as 14.7% on average.
arXiv Detail & Related papers (2021-12-04T01:40:34Z)
Pessimistic Model Selection for Offline Deep Reinforcement Learning [56.282483586473816]
Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications. One main barrier is the over-fitting issue that leads to poor generalizability of the policy learned by DRL. We propose a pessimistic model selection (PMS) approach for offline DRL with a theoretical guarantee.
arXiv Detail & Related papers (2021-11-29T06:29:49Z)
Combining Deep Learning and Optimization for Security-Constrained Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems. Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs. This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.