Using General Value Functions to Learn Domain-Backed Inventory
Management Policies
- URL: http://arxiv.org/abs/2311.02125v1
- Date: Fri, 3 Nov 2023 08:35:54 GMT
- Title: Using General Value Functions to Learn Domain-Backed Inventory
Management Policies
- Authors: Durgesh Kalwar, Omkar Shelke, Harshad Khadilkar
- Abstract summary: In existing literature, General Value Functions (GVFs) have primarily been used for auxiliary task learning.
We use this capability to train GVFs on domain-critical characteristics such as prediction of stock-out probability and wastage quantity.
We show that the GVF predictions can be used to provide additional domain-backed insights into the decisions proposed by the RL agent.
- Score: 2.0257616108612373
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We consider the inventory management problem, where the goal is to balance
conflicting objectives such as availability and wastage of a large range of
products in a store. We propose a reinforcement learning (RL) approach that
utilises General Value Functions (GVFs) to derive domain-backed inventory
replenishment policies. The inventory replenishment decisions are modelled as a
sequential decision making problem, which is challenging due to uncertain
demand and the existence of aggregate (cross-product) constraints. In existing
literature, GVFs have primarily been used for auxiliary task learning. We use
this capability to train GVFs on domain-critical characteristics such as
prediction of stock-out probability and wastage quantity. Using this domain
expertise for more effective exploration, we train an RL agent to compute the
inventory replenishment quantities for a large range of products (up to 6000 in
the reported experiments), which share aggregate constraints such as the total
weight/volume per delivery. Additionally, we show that the GVF predictions can
be used to provide additional domain-backed insights into the decisions
proposed by the RL agent. Finally, since the environment dynamics are fully
transferred, the trained GVFs can be used for faster adaptation to vastly
different business objectives (for example, due to the start of a promotional
period or due to deployment in a new customer environment).
Related papers
- Zero-shot Generalization in Inventory Management: Train, then Estimate and Decide [0.0]
Deploying deep reinforcement learning (DRL) in real-world inventory management presents challenges.
These challenges highlight a research gap, suggesting a need for a unifying framework to model and solve sequential decision-making under parameter uncertainty.
We address this by exploring an underexplored area of DRL for inventory management: training generally capable agents (GCAs) under zero-shot generalization (ZSG)
arXiv Detail & Related papers (2024-11-01T11:20:05Z) - Towards Cost Sensitive Decision Making [14.279123976398926]
In this work, we consider RL models that may actively acquire features from the environment to improve the decision quality and certainty.
We propose the Active-Acquisition POMDP and identify two types of the acquisition process for different application domains.
In order to assist the agent in the actively-acquired partially-observed environment and alleviate the exploration-exploitation dilemma, we develop a model-based approach.
arXiv Detail & Related papers (2024-10-04T19:48:23Z) - Beyond Expected Return: Accounting for Policy Reproducibility when
Evaluating Reinforcement Learning Algorithms [9.649114720478872]
Many applications in Reinforcement Learning (RL) have noise ority present in the environment.
These uncertainties lead the exact same policy to perform differently, from one roll-out to another.
Common evaluation procedures in RL summarise the consequent return distributions using solely the expected return, which does not account for the spread of the distribution.
Our work defines this spread as the policy: the ability of a policy to obtain similar performance when rolled out many times, a crucial property in some real-world applications.
arXiv Detail & Related papers (2023-12-12T11:22:31Z) - Off-Policy Evaluation for Large Action Spaces via Policy Convolution [60.6953713877886]
Policy Convolution family of estimators uses latent structure within actions to strategically convolve the logging and target policies.
Experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC.
arXiv Detail & Related papers (2023-10-24T01:00:01Z) - Cooperative Multi-Agent Reinforcement Learning for Inventory Management [0.5276232626689566]
Reinforcement Learning (RL) for inventory management is a nascent field of research.
We present a system with a custom GPU-parallelized environment that consists of one warehouse and multiple stores.
We achieve a system that outperforms standard inventory control policies.
arXiv Detail & Related papers (2023-04-18T06:55:59Z) - Multi-Agent Reinforcement Learning with Shared Resources for Inventory
Management [62.23979094308932]
In our setting, the constraint on the shared resources (such as the inventory capacity) couples the otherwise independent control for each SKU.
We formulate the problem with this structure as Shared-Resource Game (SRSG)and propose an efficient algorithm called Context-aware Decentralized PPO (CD-PPO)
Through extensive experiments, we demonstrate that CD-PPO can accelerate the learning procedure compared with standard MARL algorithms.
arXiv Detail & Related papers (2022-12-15T09:35:54Z) - Explaining Cross-Domain Recognition with Interpretable Deep Classifier [100.63114424262234]
Interpretable Deep (IDC) learns the nearest source samples of a target sample as evidence upon which the classifier makes the decision.
Our IDC leads to a more explainable model with almost no accuracy degradation and effectively calibrates classification for optimum reject options.
arXiv Detail & Related papers (2022-11-15T15:58:56Z) - Intelligent Warehouse Allocator for Optimal Regional Utilization [0.0]
We use machine learning and optimization methods to build an efficient solution to this warehouse allocation problem.
We conduct a back-testing by using this solution and validate the efficiency of this model by demonstrating a significant uptick in two key metrics Regional Utilization (RU) and Percentage Two-day-delivery (2DD)
arXiv Detail & Related papers (2020-07-09T21:46:15Z) - Feature Alignment and Restoration for Domain Generalization and
Adaptation [93.39253443415392]
Cross domain feature alignment has been widely explored to pull the feature distributions of different domains in order to learn domain-invariant representations.
We propose a unified framework termed Feature Alignment and Restoration (FAR) to simultaneously ensure high generalization and discrimination power of the networks.
Experiments on multiple classification benchmarks demonstrate the high performance and strong generalization of our FAR framework for both domain generalization and unsupervised domain adaptation.
arXiv Detail & Related papers (2020-06-22T05:08:13Z) - Universal Source-Free Domain Adaptation [57.37520645827318]
We propose a novel two-stage learning process for domain adaptation.
In the Procurement stage, we aim to equip the model for future source-free deployment, assuming no prior knowledge of the upcoming category-gap and domain-shift.
In the Deployment stage, the goal is to design a unified adaptation algorithm capable of operating across a wide range of category-gaps.
arXiv Detail & Related papers (2020-04-09T07:26:20Z) - Towards Inheritable Models for Open-Set Domain Adaptation [56.930641754944915]
We introduce a practical Domain Adaptation paradigm where a source-trained model is used to facilitate adaptation in the absence of the source dataset in future.
We present an objective way to quantify inheritability to enable the selection of the most suitable source model for a given target domain, even in the absence of the source data.
arXiv Detail & Related papers (2020-04-09T07:16:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.