A Learning Based Framework for Handling Uncertain Lead Times in
Multi-Product Inventory Management
- URL: http://arxiv.org/abs/2203.00885v1
- Date: Wed, 2 Mar 2022 05:50:04 GMT
- Title: A Learning Based Framework for Handling Uncertain Lead Times in
Multi-Product Inventory Management
- Authors: Hardik Meisheri, Somjit Nath, Mayank Baranwal, Harshad Khadilkar
- Abstract summary: Most existing literature on supply chain and inventory management consider demand processes with zero or constant lead times.
Motivated by the recently introduced delay-resolved deep Q-learning (DRDQN) algorithm, this paper develops a reinforcement learning based paradigm for handling uncertainty in lead times.
- Score: 8.889304968879163
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Most existing literature on supply chain and inventory management consider
stochastic demand processes with zero or constant lead times. While it is true
that in certain niche scenarios, uncertainty in lead times can be ignored, most
real-world scenarios exhibit stochasticity in lead times. These random
fluctuations can be caused due to uncertainty in arrival of raw materials at
the manufacturer's end, delay in transportation, an unforeseen surge in
demands, and switching to a different vendor, to name a few. Stochasticity in
lead times is known to severely degrade the performance in an inventory
management system, and it is only fair to abridge this gap in supply chain
system through a principled approach. Motivated by the recently introduced
delay-resolved deep Q-learning (DRDQN) algorithm, this paper develops a
reinforcement learning based paradigm for handling uncertainty in lead times
(\emph{action delay}). Through empirical evaluations, it is further shown that
the inventory management with uncertain lead times is not only equivalent to
that of delay in information sharing across multiple echelons
(\emph{observation delay}), a model trained to handle one kind of delay is
capable to handle delays of another kind without requiring to be retrained.
Finally, we apply the delay-resolved framework to scenarios comprising of
multiple products subjected to stochasticity in lead times, and elucidate how
the delay-resolved framework negates the effect of any delay to achieve
near-optimal performance.
Related papers
- Uncertainty-Aware Delivery Delay Duration Prediction via Multi-Task Deep Learning [11.2212153491325]
This paper introduces a multi-task deep learning model for delivery delay duration prediction in the presence of significant imbalanced data.<n>The proposed model is evaluated on a large-scale real-world dataset from an industrial partner.<n> Experimental results show that the proposed method achieves a mean absolute error of 0.67-0.91 days for delayed-shipment predictions.
arXiv Detail & Related papers (2026-02-23T19:01:03Z) - Non-Stationary Inventory Control with Lead Times [0.4927882324444362]
We study non-stationary single-item, periodic-review inventory control problems.<n>We analyze how demand non-stationarity affects learning performance across inventory models.
arXiv Detail & Related papers (2026-02-05T15:53:37Z) - Intra-request branch orchestration for efficient LLM reasoning [52.68946975865865]
Large Language Models (LLMs) increasingly rely on inference-time reasoning algorithms to improve accuracy on complex tasks.<n>Prior work has largely focused on reducing token usage, often at the expense of accuracy, while overlooking other latency factors.<n>We present DUCHESS, an LLM serving system that reduces cost and latency without sacrificing accuracy through intra-request branch orchestration guided by predictions.
arXiv Detail & Related papers (2025-09-29T15:52:08Z) - Foundation Models for Logistics: Toward Certifiable, Conversational Planning Interfaces [59.80143393787701]
Large language models (LLMs) can handle uncertainty and promise to accelerate replanning while lowering the barrier to entry.<n>We introduce a neurosymbolic framework that pairs the accessibility of natural-language dialogue with verifiable guarantees on goal interpretation.<n>A lightweight model, fine-tuned on just 100 uncertainty-filtered examples, surpasses the zero-shot performance of GPT-4.1 while cutting inference latency by nearly 50%.
arXiv Detail & Related papers (2025-07-15T14:24:01Z) - Adaptive Reinforcement Learning for Unobservable Random Delays [46.04329493317009]
We introduce a general framework that enables agents to adaptively handle unobservable and time-varying delays.<n>Specifically, the agent generates a matrix of possible future actions to handle both unpredictable delays and lost action packets sent over networks.<n>Our method significantly outperforms state-of-the-art approaches across a wide range of benchmark environments.
arXiv Detail & Related papers (2025-06-17T11:11:37Z) - Auto-Prompt Generation is Not Robust: Prompt Optimization Driven by Pseudo Gradient [50.15090865963094]
We introduce PertBench, a comprehensive benchmark dataset that includes a wide range of input perturbations.<n>Our analysis reveals substantial vulnerabilities in existing prompt generation strategies.<n>We propose PGO, a gradient-free prompt generation framework that leverages perturbation types as pseudo-gradient signals.
arXiv Detail & Related papers (2024-12-24T06:05:08Z) - Zero-shot Generalization in Inventory Management: Train, then Estimate and Decide [0.0]
Deploying deep reinforcement learning (DRL) in real-world inventory management presents challenges.
These challenges highlight a research gap, suggesting a need for a unifying framework to model and solve sequential decision-making under parameter uncertainty.
We address this by exploring an underexplored area of DRL for inventory management: training generally capable agents (GCAs) under zero-shot generalization (ZSG)
arXiv Detail & Related papers (2024-11-01T11:20:05Z) - Distributed Stochastic Gradient Descent with Staleness: A Stochastic Delay Differential Equation Based Framework [56.82432591933544]
Distributed gradient descent (SGD) has attracted considerable recent attention due to its potential for scaling computational resources, reducing training time, and helping protect user privacy in machine learning.
This paper presents the run time and staleness of distributed SGD based on delay differential equations (SDDEs) and the approximation of gradient arrivals.
It is interestingly shown that increasing the number of activated workers does not necessarily accelerate distributed SGD due to staleness.
arXiv Detail & Related papers (2024-06-17T02:56:55Z) - DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays [26.032139258562708]
We propose $textbfDEER (Delay-resilient-Enhanced RL)$, a framework designed to effectively enhance the interpretability and address the random delay issues.
In a variety of delayed scenarios, the trained encoder can seamlessly integrate with standard RL algorithms without requiring additional modifications.
The results confirm that DEER is superior to state-of-the-art RL algorithms in both constant and random delay settings.
arXiv Detail & Related papers (2024-06-05T09:45:26Z) - Tree Search-Based Policy Optimization under Stochastic Execution Delay [46.849634120584646]
Delayed execution MDPs are a new formalism addressing random delays without resorting to state augmentation.
We show that given observed delay values, it is sufficient to perform a policy search in the class of Markov policies.
We devise DEZ, a model-based algorithm that optimize over the class of Markov policies.
arXiv Detail & Related papers (2024-04-08T12:19:04Z) - Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling [73.5602474095954]
We study the non-asymptotic performance of approximation schemes with delayed updates under Markovian sampling.
Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms.
arXiv Detail & Related papers (2024-02-19T03:08:02Z) - Guaranteed Dynamic Scheduling of Ultra-Reliable Low-Latency Traffic via
Conformal Prediction [72.59079526765487]
The dynamic scheduling of ultra-reliable and low-latency traffic (URLLC) in the uplink can significantly enhance the efficiency of coexisting services.
The main challenge is posed by the uncertainty in the process of URLLC packet generation.
We introduce a novel scheduler for URLLC packets that provides formal guarantees on reliability and latency irrespective of the quality of the URLLC traffic predictor.
arXiv Detail & Related papers (2023-02-15T14:09:55Z) - Effective Multi-User Delay-Constrained Scheduling with Deep Recurrent
Reinforcement Learning [28.35473469490186]
Multi-user delay constrained scheduling is important in many real-world applications including wireless communication, live streaming, and cloud computing.
We propose a deep reinforcement learning (DRL) algorithm, named Recurrent Softmax Delayed Deep Double Deterministic Policy Gradient ($mathttRSD4$)
$mathttRSD4$ guarantees resource and delay constraints by Lagrangian dual and delay-sensitive queues, respectively.
It also efficiently tackles partial observability with a memory mechanism enabled by the recurrent neural network (RNN) and introduces user-level decomposition and node-level
arXiv Detail & Related papers (2022-08-30T08:44:15Z) - Revisiting State Augmentation methods for Reinforcement Learning with
Stochastic Delays [10.484851004093919]
This paper formally describes the notion of Markov Decision Processes (MDPs) with delays.
We show that delayed MDPs can be transformed into equivalent standard MDPs (without delays) with significantly simplified cost structure.
We employ this equivalence to derive a model-free Delay-Resolved RL framework and show that even a simple RL algorithm built upon this framework achieves near-optimal rewards in environments with delays in actions and observations.
arXiv Detail & Related papers (2021-08-17T10:45:55Z) - Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions [54.25616645675032]
We study the Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm.
We consider two settings: the reward-dependent delay setting, where realized delays may depend on the rewards, and the reward-independent delay setting.
Our main contribution is algorithms that achieve near-optimal regret in each of the settings.
arXiv Detail & Related papers (2021-06-04T12:26:06Z) - Stochastic bandits with arm-dependent delays [102.63128271054741]
We propose a simple but efficient UCB-based algorithm called the PatientBandits.
We provide both problems-dependent and problems-independent bounds on the regret as well as performance lower bounds.
arXiv Detail & Related papers (2020-06-18T12:13:58Z) - Non-Stationary Delayed Bandits with Intermediate Observations [10.538264213183076]
Online recommender systems often face long delays in receiving feedback, especially when optimizing for some long-term metrics.
We introduce the problem of non-stationary, delayed bandits with intermediate observations.
We develop an efficient algorithm based on UCRL, and prove sublinear regret guarantees for its performance.
arXiv Detail & Related papers (2020-06-03T09:27:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.