Related papers: A Learning Based Framework for Handling Uncertain Lead Times in Multi-Product Inventory Management

A Learning Based Framework for Handling Uncertain Lead Times in Multi-Product Inventory Management

URL: http://arxiv.org/abs/2203.00885v1
Date: Wed, 2 Mar 2022 05:50:04 GMT
Title: A Learning Based Framework for Handling Uncertain Lead Times in Multi-Product Inventory Management
Authors: Hardik Meisheri, Somjit Nath, Mayank Baranwal, Harshad Khadilkar
Abstract summary: Most existing literature on supply chain and inventory management consider demand processes with zero or constant lead times. Motivated by the recently introduced delay-resolved deep Q-learning (DRDQN) algorithm, this paper develops a reinforcement learning based paradigm for handling uncertainty in lead times.
Score: 8.889304968879163
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Most existing literature on supply chain and inventory management consider stochastic demand processes with zero or constant lead times. While it is true that in certain niche scenarios, uncertainty in lead times can be ignored, most real-world scenarios exhibit stochasticity in lead times. These random fluctuations can be caused due to uncertainty in arrival of raw materials at the manufacturer's end, delay in transportation, an unforeseen surge in demands, and switching to a different vendor, to name a few. Stochasticity in lead times is known to severely degrade the performance in an inventory management system, and it is only fair to abridge this gap in supply chain system through a principled approach. Motivated by the recently introduced delay-resolved deep Q-learning (DRDQN) algorithm, this paper develops a reinforcement learning based paradigm for handling uncertainty in lead times (\emph{action delay}). Through empirical evaluations, it is further shown that the inventory management with uncertain lead times is not only equivalent to that of delay in information sharing across multiple echelons (\emph{observation delay}), a model trained to handle one kind of delay is capable to handle delays of another kind without requiring to be retrained. Finally, we apply the delay-resolved framework to scenarios comprising of multiple products subjected to stochasticity in lead times, and elucidate how the delay-resolved framework negates the effect of any delay to achieve near-optimal performance.

Related papers

Zero-shot Generalization in Inventory Management: Train, then Estimate and Decide [0.0]
Deploying deep reinforcement learning (DRL) in real-world inventory management presents challenges. These challenges highlight a research gap, suggesting a need for a unifying framework to model and solve sequential decision-making under parameter uncertainty. We address this by exploring an underexplored area of DRL for inventory management: training generally capable agents (GCAs) under zero-shot generalization (ZSG)
arXiv Detail & Related papers (2024-11-01T11:20:05Z)
Distributed Stochastic Gradient Descent with Staleness: A Stochastic Delay Differential Equation Based Framework [56.82432591933544]
Distributed gradient descent (SGD) has attracted considerable recent attention due to its potential for scaling computational resources, reducing training time, and helping protect user privacy in machine learning. This paper presents the run time and staleness of distributed SGD based on delay differential equations (SDDEs) and the approximation of gradient arrivals. It is interestingly shown that increasing the number of activated workers does not necessarily accelerate distributed SGD due to staleness.
arXiv Detail & Related papers (2024-06-17T02:56:55Z)
DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays [26.032139258562708]
We propose $textbfDEER (Delay-resilient-Enhanced RL)$, a framework designed to effectively enhance the interpretability and address the random delay issues. In a variety of delayed scenarios, the trained encoder can seamlessly integrate with standard RL algorithms without requiring additional modifications. The results confirm that DEER is superior to state-of-the-art RL algorithms in both constant and random delay settings.
arXiv Detail & Related papers (2024-06-05T09:45:26Z)
Tree Search-Based Policy Optimization under Stochastic Execution Delay [46.849634120584646]
Delayed execution MDPs are a new formalism addressing random delays without resorting to state augmentation. We show that given observed delay values, it is sufficient to perform a policy search in the class of Markov policies. We devise DEZ, a model-based algorithm that optimize over the class of Markov policies.
arXiv Detail & Related papers (2024-04-08T12:19:04Z)
Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling [73.5602474095954]
We study the non-asymptotic performance of approximation schemes with delayed updates under Markovian sampling. Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms.
arXiv Detail & Related papers (2024-02-19T03:08:02Z)
Guaranteed Dynamic Scheduling of Ultra-Reliable Low-Latency Traffic via Conformal Prediction [72.59079526765487]
The dynamic scheduling of ultra-reliable and low-latency traffic (URLLC) in the uplink can significantly enhance the efficiency of coexisting services. The main challenge is posed by the uncertainty in the process of URLLC packet generation. We introduce a novel scheduler for URLLC packets that provides formal guarantees on reliability and latency irrespective of the quality of the URLLC traffic predictor.
arXiv Detail & Related papers (2023-02-15T14:09:55Z)
Effective Multi-User Delay-Constrained Scheduling with Deep Recurrent Reinforcement Learning [28.35473469490186]
Multi-user delay constrained scheduling is important in many real-world applications including wireless communication, live streaming, and cloud computing. We propose a deep reinforcement learning (DRL) algorithm, named Recurrent Softmax Delayed Deep Double Deterministic Policy Gradient ($mathttRSD4$) $mathttRSD4$ guarantees resource and delay constraints by Lagrangian dual and delay-sensitive queues, respectively. It also efficiently tackles partial observability with a memory mechanism enabled by the recurrent neural network (RNN) and introduces user-level decomposition and node-level
arXiv Detail & Related papers (2022-08-30T08:44:15Z)
Revisiting State Augmentation methods for Reinforcement Learning with Stochastic Delays [10.484851004093919]
This paper formally describes the notion of Markov Decision Processes (MDPs) with delays. We show that delayed MDPs can be transformed into equivalent standard MDPs (without delays) with significantly simplified cost structure. We employ this equivalence to derive a model-free Delay-Resolved RL framework and show that even a simple RL algorithm built upon this framework achieves near-optimal rewards in environments with delays in actions and observations.
arXiv Detail & Related papers (2021-08-17T10:45:55Z)
Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions [54.25616645675032]
We study the Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm. We consider two settings: the reward-dependent delay setting, where realized delays may depend on the rewards, and the reward-independent delay setting. Our main contribution is algorithms that achieve near-optimal regret in each of the settings.
arXiv Detail & Related papers (2021-06-04T12:26:06Z)
Stochastic bandits with arm-dependent delays [102.63128271054741]
We propose a simple but efficient UCB-based algorithm called the PatientBandits. We provide both problems-dependent and problems-independent bounds on the regret as well as performance lower bounds.
arXiv Detail & Related papers (2020-06-18T12:13:58Z)
Non-Stationary Delayed Bandits with Intermediate Observations [10.538264213183076]
Online recommender systems often face long delays in receiving feedback, especially when optimizing for some long-term metrics. We introduce the problem of non-stationary, delayed bandits with intermediate observations. We develop an efficient algorithm based on UCRL, and prove sublinear regret guarantees for its performance.
arXiv Detail & Related papers (2020-06-03T09:27:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.