Related papers: Zero-shot Generalization in Inventory Management: Train, then Estimate and Decide

Zero-shot Generalization in Inventory Management: Train, then Estimate and Decide

URL: http://arxiv.org/abs/2411.00515v1
Date: Fri, 01 Nov 2024 11:20:05 GMT
Title: Zero-shot Generalization in Inventory Management: Train, then Estimate and Decide
Authors: Tarkan Temizöz, Christina Imdahl, Remco Dijkman, Douniel Lamghari-Idrissi, Willem van Jaarsveld,
Abstract summary: Deploying deep reinforcement learning (DRL) in real-world inventory management presents challenges. These challenges highlight a research gap, suggesting a need for a unifying framework to model and solve sequential decision-making under parameter uncertainty. We address this by exploring an underexplored area of DRL for inventory management: training generally capable agents (GCAs) under zero-shot generalization (ZSG)
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deploying deep reinforcement learning (DRL) in real-world inventory management presents challenges, including dynamic environments and uncertain problem parameters, e.g. demand and lead time distributions. These challenges highlight a research gap, suggesting a need for a unifying framework to model and solve sequential decision-making under parameter uncertainty. We address this by exploring an underexplored area of DRL for inventory management: training generally capable agents (GCAs) under zero-shot generalization (ZSG). Here, GCAs are advanced DRL policies designed to handle a broad range of sampled problem instances with diverse inventory challenges. ZSG refers to the ability to successfully apply learned policies to unseen instances with unknown parameters without retraining. We propose a unifying Super-Markov Decision Process formulation and the Train, then Estimate and Decide (TED) framework to train and deploy a GCA tailored to inventory management applications. The TED framework consists of three phases: training a GCA on varied problem instances, continuously estimating problem parameters during deployment, and making decisions based on these estimates. Applied to periodic review inventory problems with lost sales, cyclic demand patterns, and stochastic lead times, our trained agent, the Generally Capable Lost Sales Network (GC-LSN) consistently outperforms well-known traditional policies when problem parameters are known. Moreover, under conditions where demand and/or lead time distributions are initially unknown and must be estimated, we benchmark against online learning methods that provide worst-case performance guarantees. Our GC-LSN policy, paired with the Kaplan-Meier estimator, is demonstrated to complement these methods by providing superior empirical performance.

Related papers

Sample-Efficient Neurosymbolic Deep Reinforcement Learning [49.60927398960061]
We propose a neuro-symbolic Deep RL approach that integrates background symbolic knowledge to improve sample efficiency.<n>Online reasoning is performed to guide the training process through two mechanisms.<n>We show improved performance over a state-of-the-art reward machine baseline.
arXiv Detail & Related papers (2026-01-06T09:28:53Z)
Structure-Informed Deep Reinforcement Learning for Inventory Management [8.697068617006964]
This paper investigates the application of Deep Reinforcement Learning to classical inventory management problems.<n>We apply a DRL algorithm based on DirectBackprop to several fundamental inventory management scenarios.<n>We show that our generic DRL implementation performs competitively against or outperforms established benchmarks and distributions.
arXiv Detail & Related papers (2025-07-29T17:41:45Z)
Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models [71.34520793462069]
Unsupervised reinforcement learning (RL) aims at pre-training agents that can solve a wide range of downstream tasks in complex environments. We introduce a novel algorithm regularizing unsupervised RL towards imitating trajectories from unlabeled behavior datasets. We demonstrate the effectiveness of this new approach in a challenging humanoid control problem.
arXiv Detail & Related papers (2025-04-15T10:41:11Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
Towards Sample-Efficiency and Generalization of Transfer and Inverse Reinforcement Learning: A Comprehensive Literature Review [50.67937325077047]
This paper is devoted to a comprehensive review of realizing the sample efficiency and generalization of RL algorithms through transfer and inverse reinforcement learning (T-IRL) Our findings denote that a majority of recent research works have dealt with the aforementioned challenges by utilizing human-in-the-loop and sim-to-real strategies. Under the IRL structure, training schemes that require a low number of experience transitions and extension of such frameworks to multi-agent and multi-intention problems have been the priority of researchers in recent years.
arXiv Detail & Related papers (2024-11-15T15:18:57Z)
Neural Coordination and Capacity Control for Inventory Management [4.533373101620897]
This paper is motivated by the questions of what does it mean to backtest a capacity control mechanism and can we devise and backtest a capacity control mechanism compatible with recent advances in deep reinforcement learning for inventory management? First, because we only have a single historic sample path of Amazon's capacity limits, we propose a method that samples from a distribution of possible constraint paths covering a space of real-world scenarios. Second, we extend the exo-IDP (Exogenous Decision Process) formulation of Madeka et al. 2022 to capacitated periodic review inventory control problems and show that certain capacit
arXiv Detail & Related papers (2024-09-24T16:23:10Z)
Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions. We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z)
Combinatorial Optimization with Policy Adaptation using Latent Space Search [44.12073954093942]
We present a novel approach for designing performant algorithms to solve complex, typically NP-hard, problems. We show that our search strategy outperforms state-of-the-art approaches on 11 standard benchmarking tasks.
arXiv Detail & Related papers (2023-11-13T12:24:54Z)
Using General Value Functions to Learn Domain-Backed Inventory Management Policies [2.0257616108612373]
In existing literature, General Value Functions (GVFs) have primarily been used for auxiliary task learning. We use this capability to train GVFs on domain-critical characteristics such as prediction of stock-out probability and wastage quantity. We show that the GVF predictions can be used to provide additional domain-backed insights into the decisions proposed by the RL agent.
arXiv Detail & Related papers (2023-11-03T08:35:54Z)
Offline Reinforcement Learning with On-Policy Q-Function Regularization [57.09073809901382]
We deal with the (potentially catastrophic) extrapolation error induced by the distribution shift between the history dataset and the desired policy. We propose two algorithms taking advantage of the estimated Q-function through regularizations, and demonstrate they exhibit strong performance on the D4RL benchmarks.
arXiv Detail & Related papers (2023-07-25T21:38:08Z)
Semantically Aligned Task Decomposition in Multi-Agent Reinforcement Learning [56.26889258704261]
We propose a novel "disentangled" decision-making method, Semantically Aligned task decomposition in MARL (SAMA) SAMA prompts pretrained language models with chain-of-thought that can suggest potential goals, provide suitable goal decomposition and subgoal allocation as well as self-reflection-based replanning. SAMA demonstrates considerable advantages in sample efficiency compared to state-of-the-art ASG methods.
arXiv Detail & Related papers (2023-05-18T10:37:54Z)
Product Segmentation Newsvendor Problems: A Robust Learning Approach [6.346881818701668]
Product segmentation newsvendor problem is a new variant of the newsvendor problem. We propose a new paradigm termed robust learning to increase the attractiveness of robust policies.
arXiv Detail & Related papers (2022-07-08T10:13:10Z)
Math Programming based Reinforcement Learning for Multi-Echelon Inventory Management [1.9161790404101895]
Reinforcement learning has lead to considerable break-throughs in diverse areas such as robotics, games and many others. But the application to RL in complex real-world decision making problems remains limited. These characteristics make the problem considerably harder to solve for existing RL methods that rely on enumeration techniques to solve per step action problems. We show that a properly selected discretization of the underlying uncertain distribution can yield near optimal actor policy even with very few samples from the underlying uncertainty. We find that PARL outperforms commonly used base stock by 44.7% and the best performing RL method by up to 12.1% on average
arXiv Detail & Related papers (2021-12-04T01:40:34Z)
Deep Controlled Learning for Inventory Control [0.0]
The application of Deep Reinforcement Learning (DRL) to inventory management is an emerging field.<n>Traditional DRL algorithms, originally developed for diverse domains such as game-playing and robotics, may not be well-suited for the specific challenges posed by inventory management.<n>We propose Deep Learning (DCL), a new DRL algorithm designed for highly numerical problems.
arXiv Detail & Related papers (2020-11-30T18:53:08Z)
Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents. We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks. This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
Conservative Q-Learning for Offline Reinforcement Learning [106.05582605650932]
We show that CQL substantially outperforms existing offline RL methods, often learning policies that attain 2-5 times higher final return. We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.
arXiv Detail & Related papers (2020-06-08T17:53:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.